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Abstract.  Proof-Carrying  Code  (PCC)  is  a  general  framework  for  verifying  the 
safety  properties  of  machine-language  programs.  PCC  proofs  are  usually  written  in 
a  logic  extended  with  language-specific  typing  rules;  they  certify  safety  but  only  if 
there  is  no  bug  in  the  typing  rules.  In  Foundational  Proof-Carrying  Code  (FPCC),  on 
the  other  hand,  proofs  are  constructed  and  verified  using  strictly  the  foundations  of 
mathematical  logic,  with  no  type-specific  axioms.  FPCC  is  more  flexible  and  secure 
because  it  is  not  tied  to  any  particular  type  system  and  it  has  a  smaller  trusted 
base. 

Foundational  proofs,  however,  are  much  harder  to  construct.  Previous  efforts  on 
FPCC  all  required  building  sophisticated  semantic  models  for  types.  Furthermore, 
none  of  them  can  be  easily  extended  to  support  mutable  Helds  and  higher-order 
polymorphism.  In  this  article,  we  present  a  syntactic  approach  to  FPCC  that  avoids 
all  of  these  difficulties.  Under  our  new  scheme,  the  foundational  proof  for  a  typed 
machine  program  simply  consists  of  the  typing  derivation  plus  the  formalized  syn¬ 
tactic  soundness  proof  for  the  underlying  type  system.  The  former  can  be  readily 
obtained  from  a  type-checker  while  the  latter  is  known  to  be  much  easier  to  construct 
than  the  semantic  soundness  proofs.  We  give  a  translation  from  a  typed  assembly 
language  into  FPCC  and  demonstrate  the  advantages  of  our  new  system  via  an 
implementation  in  the  Coq  proof  assistant. 

Keywords:  foundational  proof-carrying  code,  syntactic  soundness  proof,  typed 
assembly  language 


1.  Introduction 

Proof-Carrying  Code  (PCC),  as  pioneered  by  Necula  and  Lee  [17,  15], 
allows  a  code  producer  to  provide  a  machine-language  program  to  a 
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host  along  with  a  formal  proof  of  its  safety.  The  proof  can  be  mechani¬ 
cally  checked  by  the  host  and  the  producer  need  not  be  trusted  because 
a  valid  proof  is  a  dependable  certificate  of  safety. 

The  proofs  in  Necula’s  PCC  systems  [16,  7]  are  written  in  a  logic 
extended  with  many  language-specific  typing  rules.  They  can  guarantee 
safety  only  if  there  are  no  bugs  in  the  verification-condition  generator 
(VCgen),  the  typing  rules,  and  the  proof  checker.  The  VCgen  is  a  fairly 
large  program,  so  establishing  its  full  correctness  is  a  daunting  task.  The 
typing  rules  are  also  error-prone:  League  et  al.  [11]  recently  discovered 
a  serious  bug  in  the  Special  J  typing  rules  that  would  undermine  the 
integrity  of  the  entire  PCC-based  system. 

Foundational  Proof-Carrying  Code  (FPCC)  [5,  3]  tackles  these  prob¬ 
lems  by  constructing  and  verifying  its  proofs  using  strictly  the  foun¬ 
dations  of  mathematical  logic,  with  no  type-specific  axioms.  FPCC  is 
more  flexible  and  secure  because  it  is  not  tied  to  any  particular  type 
system  and  has  a  smaller  trusted  base. 

Foundational  proofs,  however,  are  much  harder  to  construct.  Previ¬ 
ous  efforts  on  FPCC  [5,  9,  1,  6]  all  required  constructing  sophisticated 
semantic  models  to  reason  about  types.  For  example,  to  support  con- 
travariant  recursive  types,  Appel  and  Felty  [9]  initially  decided  to  model 
each  type  as  a  partial  equivalence  relation,  but  later  found  that  building 
the  actual  foundational  proofs  would  “require  years  of  effort  implement¬ 
ing  machine-checked  proofs  of  basic  results  in  computability  theory”  [6, 
page  2],  Appel  and  McAllester  [6]  later  proposed  an  indexed  model 
which  significantly  simplified  the  proofs  but  still  involves  tedious  rea¬ 
soning  of  computation  steps  with  each  type  being  defined  as  a  complex 
set  of  indexed  values.  More  seriously,  none  of  these  approaches  can 
be  easily  extended  to  support  mutable  fields  and  higher-order  poly¬ 
morphism.  In  fact,  the  only  known  solution  to  mutable  fields  was  only 
proposed  very  recently  by  Ahmed  et  al.  [2] — the  proposal  involves  build¬ 
ing  a  hierarchy  of  Godel  numberings  and  making  extensive  changes  to 
semantic  models  used  in  existing  FPCC  systems  [5,  6]. 

In  this  article,  we  present  a  syntactic  approach  to  FPCC  that  avoids 
all  of  these  difficulties.  Under  our  new  scheme,  the  foundational  proof 
for  a  typed  machine  program  simply  consists  of  the  typing  derivation 
plus  the  syntactic  soundness  proof  (of  the  underlying  type  system). 
Here  the  typing  derivation  can  be  readily  obtained  from  a  type-checker 
while  the  syntactic  soundness  proof  is  known  to  be  much  easier  to 
construct  than  the  semantic  soundness  proof  [25].  Our  article  makes 
the  following  new  contributions: 
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Foundational  proofs  are  widely  perceived  as  extremely  hard  and 
tedious  to  construct,  partly  because  existing  efforts  [5,  9,  1,  6,  2,  21] 
on  FPCC  have  all  adopted  the  semantic  approach  (which  requires 
building  sophisticated  models  from  first  principles).  We  show  that 
this  perception  is  not  true:  with  a  syntactic  approach,  constructing 
foundational  proofs  is  much  simpler  and  more  straightforward. 

As  far  as  we  know,  our  work  is  the  first  comprehensive  study  on 
how  to  use  the  syntactic  approach  to  generate  FPCC.  The  idea  that 
attaching  the  soundness  proof  (for  the  underlying  type  system)  can 
reduce  the  trusted  base  is  not  new  [16,  3],  however,  none  of  the 
existing  work  has  shown  how  to  use  the  syntactic  proof  to  build  the 
foundational  proof.  In  addition,  we  show  in  Sections  3  and  4  that 
naively  combining  existing  typed  assembly  languages  (TAL)  [14, 
13,  26]  with  their  soundness  proofs  do  not  necessarily  produce 
valid  FPCC.  To  make  the  syntactic  approach  work,  we  need  to 
ensure  that  a  close  correspondence  can  be  established  between  the 
TAL  and  the  underlying  FPCC  machine.  This  involves  developing 
a  type-system  for  TAL  which  is  not  only  sound  but  which  also 
enforces  the  invariants  needed  for  the  FPCC  safety  proofs. 

The  relationship  between  TAL  [14]  and  PCC  [17]  has  never  been 
made  precise  even  though  the  two  are  considered  as  related  ap¬ 
proaches  for  certifying  low-level  code.  In  Section  5  we  show  how 
to  translate  each  well-typed  program  in  a  non-trivial  TAL  into 
FPCC.  The  translation  is  interesting  because  it  not  only  shows  the 
connection  between  the  two  but  also  gives  new  insights  on  how  to 
turn  the  expressive  invariants  in  PCC  into  rich  typing  constructs 
in  TAL. 

We  show  that  the  syntactic  approach  to  FPCC  can  support  re¬ 
cursive  types,  mutable  fields,  and  first-class  code  pointers  without 
using  complex  constructions  required  by  the  semantic  approaches. 
With  our  recent  results  on  certified  binaries  [20]  and  inductive 
definitions  of  quantified  types  [23],  the  syntactic  approach  offers 
a  more  scalable  alternative  for  compiling  high-level  richly  typed 
programs  into  FPCC. 

Finally,  independent  of  our  results  on  FPCC,  the  typed  assembly 
language  presented  in  Section  4  is  interesting  for  its  own  sake. 
Here  our  main  contribution  is  a  simple  technique  for  type-checking 
memory  allocation  and  for  maintaining  invariants  about  the  allo¬ 
cation  state. 
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In  the  rest  of  this  article,  we  first  give  a  formal  definition  of  FPCC  (fol¬ 
lowing  [3] )  in  Section  2  and  present  an  overview  of  the  requirements  for 
constructing  foundational  proofs  in  Section  3.  We  then  formally  define 
our  sample  typed  assembly  language  (called  FTAL)  in  Section  4.  In 
Section  5  and  6  we  give  the  detailed  translation  from  FTAL  programs 
into  FPCC  and  show  how  to  turn  FTAL  typing  derivations  and  the 
(syntactic)  soundness  proof  of  FTAL  into  foundational  proofs.  Finally 
we  compare  our  approach  with  the  semantic  approach,  present  other 
related  work,  and  conclude. 


2.  Foundational  Proof-Carrying  Code 

Unlike  type-specialized  PCC,  foundational  PCC  avoids  any  commit¬ 
ment  to  a  particular  type  system.  The  operational  semantics  of  machine 
code  as  well  as  the  concept  of  safety  are  defined  in  a  suitably  expres¬ 
sive  logic.  The  code  producer  must  provide  both  the  executable  code 
and  a  proof  in  the  foundational  logic  that  the  code  satisfies  the  safety 
condition.  Both  the  machine  description  and  the  proof  must  explicitly 
define,  down  to  the  foundations  of  mathematics,  all  required  concepts 
and  prove  any  needed  properties  of  these  concepts. 

2.1.  The  logic 

To  encode  our  safety  policies  and  proofs,  we  use  the  calculus  of  induc¬ 
tive  constructions  (CiC)  [22,  19].  CiC  is  an  extension  of  the  calculus 
of  constructions  (CC)  [8],  which  is  a  higher-order  typed  lambda  calcu¬ 
lus.  CC  corresponds  to  Church’s  higher-order  predicate  logic  via  the 
Curry- Howard  isomorphism  [10].  The  syntax  of  CC  is: 

A,  B  ::  =  Set  |  Type  |  X  \  XX:  A.  B  \AB  \  UX:A.B 

The  A  term  corresponds  to  the  abstraction  of  the  lambda  calculus,  and 
the  n  term  is  a  dependent  product  type.  When  the  bound  variable 
does  not  occur  in  the  body,  the  product  type  is  usually  abbreviated  as 
A  — ►  B.  In  the  terminology  of  pure  type  systems,  Set  and  Type  are  the 
sorts. 

CiC,  as  its  name  implies,  extends  the  calculus  of  constructions  with 
inductive  definitions.  An  inductive  definition  can  be  written  in  a  syntax 
similar  to  that  of  ML  datatypes.  For  example,  the  following  introduces 
an  inductive  definition  of  natural  numbers  of  kind  schema  Set  with  two 
constructors  of  the  specified  kinds: 

Inductive  Nat  :  Set  :=  zero  :  Nat  I  succ  :  Nat— ►  Nat 
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f  £  Regnum 
w,  pc  £  Word 
M  £  Mem 
R  £  Regfile 
S  £  State 


{rO,  rl,  ...  r31  } 

{0,  1,  ...} 

Word  — >  Word 
Regnum  — »  Word 
Mem  x  Regfile  x  Word 


Instr  3  l  ::=  add  rd,rs,rt  |  addi  rd,rs,w  |  movi  r^,  io 
|  bgt  rs,rt,w  |  jd  to  |  jmp  r 
_ |  Id  rd,rs(w)  |  st  rd(w),rs  |  illegal 

Figure  1.  Memory,  registers,  state,  and  instruction. 


Inductive  definitions  may  also  be  parameterized  as  in  the  following 
definition  of  polymorphic  lists: 

Inductive  List  p : Set]  :  Set  :=  nil  :  List  t 

|  cons  :  t— ►  List  t  — >  List  t 

The  logic  also  provides  elimination  constructs  for  inductive  defini¬ 
tions,  which  combine  case  analysis  with  a  fix-point  operation.  Objects 
of  an  inductive  type  can  thus  be  iterated  over  using  these  constructs. 

In  order  for  the  induction  to  be  well-founded  and  for  iterators  to 
terminate,  a  few  constraints  are  imposed  on  the  shape  of  inductive 
definitions;  most  importantly,  the  defined  type  can  only  occur  positively 
in  the  arguments  of  its  constructors.  Mutually  inductive  types  are  also 
supported. 

The  calculus  of  inductive  constructions  has  been  shown  to  be  strongly 
normalizing  [24],  hence  the  corresponding  logic  is  consistent.  It  is  sup¬ 
ported  by  the  Coq  proof  assistant  [22],  which  we  use  to  implement  a 
prototype  system  of  the  results  presented  in  this  article. 

In  the  remainder  of  this  article,  we  will  use  more  familiar  mathe¬ 
matical  notation  to  present  the  statement  of  propositions,  rather  than 
the  strict  definition  of  CiC  syntax  given  in  this  section.  For  example, 
the  application  of  two  terms  will  be  written  as  A(B)  and  inductive 
definitions  will  be  presented  in  BNF  format.  We  will,  however,  retain 
the  II  notation,  which  can  generally  be  read  as  a  universal  quantifier. 

2.2.  The  machine 

The  machine  is  defined  by  a  machine  state  and  a  step  function  describ¬ 
ing  the  (deterministic)  transition  from  one  machine  state  to  the  next. 
Figure  1  defines  the  set  of  machine  states.  To  simplify  the  presentation, 
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we  use  an  idealized  32-register  word-addressed  machine  with  an  un¬ 
bounded  memory  of  words  of  unlimited  size.  A  machine  state  is  defined 
as  a  tuple  of  a  memory,  a  register  set,  and  a  program  counter.  The  figure 
shows  also  the  instruction  set,  Instr.  Informally,  the  instructions  have 
the  following  effects: 

add  r([,  fs,ft  set  register  fd  to  the  sum  of  the  contents  of  fs  and  rt; 

addi  T'd ,  fs ,  w  set  fd  to  the  sum  of  w  and  the  contents  of  rs; 

movi  fd,w  move  an  immediate  value  w  into  fd’, 

bgt  rs,ft,w  branch  to  location  w  if  fs  >  ft] 

j  d  w  unconditional  jump  to  location  w, 

jmp  r  indirect  jump  to  the  address  in  register  r; 

Id  fd,fs(w)  load  the  contents  of  location  fs  +  w  into  fd ; 
st  fd('w),  f.5  store  the  contents  of  fs  into  location  fd  +  w; 
illegal  put  the  machine  in  an  infinite  loop. 


Of  course,  these  instructions  are  actually  encoded  as  words  (integers) 
in  the  machine  state.  We  define  Instr  as  an  inductive  type  for  reasons  of 
convenience  since  its  constructors  are  much  easier  to  manipulate  than 
encoded  instruction  words.  Thus,  the  step  function  is  decomposed  into 
a  decoding  function  and  the  specification  of  the  machine’s  operational 
semantics.  The  decoding  function  Dc,  of  type  Word  — >  Instr ,  decodes 
a  word  into  the  appropriate  element  of  Instr  (non-decodable  words 
will  result  in  an  illegal  instruction);  we  will  omit  its  exact  definition 
since  it  is  verbose  but  not  interesting.  The  semantics  of  instructions 
is  described  by  the  function  Step  shown  in  Figure  2.  This  function  is 
easily  defined  formally  in  CiC  as  an  iterator  on  the  Instr  type. 


2.3.  The  safety  condition 

The  safety  condition  is  a  predicate  expressing  the  fact  that  code  will 
not  “go  wrong.”  We  say  that  a  program  (or,  machine  state  S)  is  safe  if 
every  state  it  can  ever  reach  satisfies  the  safety  policy  SP: 

Safe  ( S )  =  Hn :  Nat.  SP  (Stepn  (5)) 

For  this  presentation,  we  will  define  a  very  basic  and  simple  safety  pol¬ 
icy  which  states  that  the  machine  is  not  stuck  on  an  illegal  instruction: 

SP  ( M,R,pc )  =  (Dc  ( M  (pc))  /  illegal) 

In  practice,  the  safety  policy  may  also  include  more  complex  con¬ 
straints,  such  as  access  controls  on  regions  of  memory. 

An  FPCC  code  producer  must  thus  supply  an  initial  state,  So  (which 
includes  the  machine  code  of  the  program) ,  and  a  proof  A  that  this  state 
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if  Dc (M(pc))  = 

then  Step (M,R,pc)  = 

add  rd,rs,rt 

(M,R{rd  >->  R(rs)  +  R(rt)},pc+ 1) 

addi  fd,rs,w 

(M,R{rd  R(rs)  +  w},pc+ 1) 

movi  rd,  w 

(M,  R{rd  w},pc+ 1) 

bgt  rs,rt,  w 

(M,R,pc+ 1),  when  R(rs)  <  R(lt) 
(M,R,w),  when  R(rs)  >  R(jt) 

jd  w 

(M,  R,  w ) 

jmp  r 

(■ M,R,R(r )) 

Id  rd,rs(w) 

(M,R{rd  M(R(rs)+w)},pc+ 1) 

st  rd(w),rs 

(M{R(rd)+w  R(rs)},R,pc+ 1) 

illegal 

(M,  R,pc) 

Figure  2.  Machine  semantics. 

satisfies  the  safety  condition.  Via  the  Curry-Howard  isomorphism,  A 
can  be  represented  by  a  term  of  type  Safe  (So).  Thus  the  FPCC  package 
is  a  pair: 

F  =  (Sq  :  State,  A  :  Safe  (So)) 


3.  Generating  Proofs 

The  actual  proof  of  safety  is  organized  following  the  approach  used 
by  Appel  et  al.  [5,  6].  We  construct  an  induction  hypothesis  Inv,  also 
known  as  the  global  invariant,  which  holds  for  all  states  reachable  from 
the  initial  state  and  is  strong  enough  to  imply  safety.  Thus,  to  show 
that  our  initial  state  So  is  safe,  we  provide  proofs  for  the  propositions: 

Initial  Condition:  Inv  (So) 

Preservation:  IIS' :  State.  Inv  (S)  —  Inv  (Step  (S)) 

Progress:  IIS' :  State.  Inv  (S)  — ►  SP  (S) 

These  propositions  intuitively  state  that  our  invariant  holds  for  the 
initial  state,  and  for  every  subsequent  state  during  the  execution.  The 
Progress  establishes  that  whenever  the  invariant  holds,  the  safety  policy 
of  the  machine  is  also  satisfied.  Together,  these  imply  that  during  the 
execution  of  the  program,  the  safety  policy  will  never  be  violated.  To 
prove  the  initial  state  is  safe,  first  we  use  the  Initial  Condition  and  the 
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Preservation,  and  show  by  induction  that 

Iln:  Nat.  Inv  (Step"  (So)). 

Then  Safe  (So)  follows  directly  by  Progress. 

Unlike  Appel  et  al.,  who  construct  the  invariant  by  means  of  a 
semantic  model  of  types  at  the  machine  level,  our  approach  is  based 
on  the  use  of  type  soundness  [25]:  We  define  Inv  (S)  to  mean  that 
S  is  “well-formed”  syntactically.  The  well-formedness  property  must 
be  preserved  by  the  step  function,  and  must  imply  safety;  the  proofs 
of  these  properties  are  encoded  in  the  FPCC  logic  as  proof  terms  for 
Preservation  and  Progress. 

In  the  following  sections  we  show  how  to  derive  the  notion  of  well- 
formedness  for  a  machine  state  by  relating  the  state  to  a  type-correct 
program  in  a  typed  assembly  language.  The  type  system  of  the  language 
defines  a  set  of  inference  rules  for  judgments  of  the  form  \~P,  meaning 
that  the  program  P  is  well-formed  (type-correct).  The  dynamic  seman¬ 
tics  of  the  language  specifies  an  evaluation  relation  i — >  on  programs; 
we  use  here  the  term  “program”  to  denote  not  only  code  but  a  more 
general  configuration  fully  representing  a  stage  of  the  evaluation.  The 
syntactic  approach  to  proving  soundness  of  a  type  system  involves 
proving  progress  (if  \~P,  then  P  is  not  stuck,  i.e.,  there  exists  P'  such 
that  P  i — >  P')  and  preservation  (if  b  P  and  P  i — »  P' ,  then  b  P'). 

The  central  idea  of  our  approach  to  FPCC  is  to  find  a  typed  as¬ 
sembly  language  and  a  translation  relation  =>■  between  its  programs 
and  machine  states,  such  that  type-correct  programs  are  mapped  to 
well- formed  states,  and  the  evaluation  relation  is  related  to  the  step 
function — that  is,  if  P  =$■  S  and  P  \ — >  P' ,  then  P'  =>  Step  (5).  If  these 
properties  hold,  we  can  define  the  invariant  Inv  ( S )  as  simply  stating 
that  there  exists  a  type-correct  program  P  such  that  P  =>  S.  Then 
the  proofs  of  progress  and  preservation  for  the  type  system  (encoded  in 
the  FPCC  logic)  can  be  used  to  construct  straightforward  proofs  of  the 
corresponding  propositions  needed  for  the  safety  proof  for  Sq.  Further 
details  of  the  construction  of  proof  terms  are  provided  in  Section  5. 

This  method  imposes  requirements  on  the  design  of  the  typed  as¬ 
sembly  language  other  than  just  having  a  sound  type  system.  For  the 
approach  we  follow  in  this  article,  if  the  assembly  language  has  “macro” 
instructions  ( e.g .  malloc  [14,  13]  and  newarray  [26],  which  “expand” 
into  sequences  of  several  machine  instructions),  the  well-formedness  of 
the  assembly  program  alone  will  be  insufficient  for  the  construction 
of  the  global  invariant.  This  is  because  Inv  must  hold  for  all  machine 
states  reachable  from  Sq.  For  the  intermediate  states  of  the  execution  of 
a  macro  instruction,  there  are  no  corresponding  well- formed  assembly 
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programs.  Hence  each  one  of  the  assembly  instructions  must  correspond 
to  exactly  one  machine  instruction.  Note,  however,  that  this  exact  cor¬ 
respondence  of  instructions  is  not  necessary  in  general  for  the  syntactic 
approach  to  work,  but  it  facilitates  the  definition  of  the  invariant  and 
allows  for  a  simpler  presentation. 


4.  Featherweight  Typed  Assembly  Language 


The  source  language  that  we  will  be  compiling  to  FPCC  is  a  version 
of  the  typed  assembly  language  (TAL)  by  Morrisett  et  al.  [14].  The 
approach  developed  in  this  article  can  be  applied  to  a  TAL-like  language 
extended  with  higher-order  kinds  and  recursive  types.  For  simplicity 
of  presentation,  we  only  introduce  here  a  subset  of  such  a  language, 
which  we  call  the  Featherweight  Typed  Assembly  Language  (FTAL). 
It  does  not  include  polymorphism,  existential  types,  and  higher-order 
kinds.  However,  it  does  support  recursive  types,  memory  allocation, 
and  mutable  records  (tuples). 

The  syntactic  approach  to  FPCC  as  we  present  it  here  requires  that 
for  each  machine  state  and  each  state  transition,  there  be  a  correspond¬ 
ing  FTAL  program  and  transition.  For  most  FTAL  instructions  it  is 
easy  to  see  there  is  a  one-to-one  mapping  to  the  machine  instructions 
of  Section  2.2.  However,  having  a  malloc  “macro  instruction”  in  FTAL 
(as  in  TAL)  will  not  work  because  it  cannot  be  mapped  to  a  single 
machine  instruction  and  will  not  satisfy  our  requirements  for  generat¬ 
ing  FPCC  proofs,  since  there  would  be  no  corresponding  FTAL  state 
between  the  expanded  machine  instructions.  (See  Section  4.6  for  details 
on  this  issue.)  Our  approach  is  to  make  the  memory  allocation  model 
explicit  and  split  the  malloc  instruction  into,  in  this  case,  two  individual 
instructions. 

4.1.  Syntax 

We  present  the  syntax  of  FTAL  in  Figure  3.  As  in  TAL,  the  abstract 
machine  state  consists  of  a  heap  H,  a  register  file  R,  and  a  sequence 
of  instructions  I.  The  heap  maps  labels  l  to  heap  values  h,  and  the 
register  file  maps  registers  r  to  word  values  v.  We  use  {}  for  an  empty 
heap.  The  notation  H{1  i— »  h}  represents  a  heap  which  maps  l  to  h. 
and  on  all  other  labels  agrees  with  H.  Similar  notation  is  used  for  heap 
types,  register  files,  and  register  file  types.  In  ( regfile  ty),  n  <  31,  and 
not  all  user  registers  need  appear  in  the  type.  The  notation  \H\  and  |\k| 
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{type)  t  ::=  a  \  int  |  V[]T  |  (rf\ . . .  ,7#”)  |  pa.T 

{init  flag)  ip  ::=  0  |  1 

{heap  ty)  'F  ::=  {0 :  r0, . . . ,  n :  rn} 

{alloc  pt  ty)  p  ::=  fresh  |  used(n) 

{regfile  ty)  T  ::=  {r0:r0, . . .  ,rn:rn,  r31:p} 

(. label)  l  ::=  0  |  1  |  ... 

{user  reg)  r  ::=  rO  |  rl  |  . . .  |  r30 

{all  reg)  f  ::=  r  |  r31 

{word,  val)  v  ::=  l  \  i  |  ?r  |  fold  v  as  r 

{heap  val)  h  ::=  (v\, . . .  ,vn)  |  code[]T.I 

{heap)  H  ::=  {0  i— >  /i0, . . . ,  n  t— >  hn} 

{regfile)  R  ::=  { rO  i— >  Vo,  ■  ■  ■ ,  r31  U31} 

{instr)  i  ::=  add  rcj,  rs,  rt  \  addi  rd,rs,i  \  alloc  r^fr] 

I  bgt  rs,  rt,l  |  bump  i  \  fold  rd[r\,rs 
|  Id  rd,rs{i)  \  mov  rd,rs  |  movi  rd,i 
|  movi  rd,l  |  st  rd{i),rs  \  unfold  rd,rs 

{instr  seq)  I  ::=  i\  I  \  jd  l  \  jmp  r 
{program)  P  ::=  ( H,R,I ) 

Figure  3.  Syntax  of  FTAL 

is  used  to  represent  the  number  of  labels  in  the  heap  and  heap  type, 
respectively. 

Only  tuples  and  code  blocks  are  stored  in  the  heap  and  thus  these  are 
the  heap  values.  Word  values  include  labels  (of  heap  values),  integers, 
recursive  data,  and  junk  values  which  are  used  by  the  operational 
semantics  to  represent  uninitialized  tuple  elements.  The  distinction 
between  word  values  and  small  values  in  TAL  is  eliminated  in  FTAL 
by  expanding  the  instruction  set.  Thus,  for  example,  there  are  now 
two  instructions  for  addition,  one  taking  a  register  (add)  and  the  other 
using  an  immediate  value  (addi)  as  the  third  operand. 

Our  memory  model  is  a  simple  linear  unbounded  heap  with  an  allo¬ 
cation  pointer  pointing  to  the  heap  top,  initially  set  to  the  bottom  of 
the  heap  space.  Memory  allocation  consists  of  copying  the  current  allo¬ 
cation  pointer  to  a  register  using  alloc  and  then  adjusting  the  allocation 
pointer  with  bump.  In  Section  5.2  we  will  see  how  these  two  instructions 
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can  be  directly  translated  into  one  FPCC  machine  instruction  each. 
One  of  the  general  registers,  r31,  is  reserved  as  the  allocation  pointer 
register,  tracking  the  amount  of  allocated  memory.  FTAL  instructions 
will  only  explicitly  refer  to  the  first  31  “user”  registers  (?’).  To  make 
sure  that  each  alloc  is  properly  followed  by  a  corresponding  bump,  the 
allocation  pointer  register  is  given  a  special  allocation  status  type,  p, 
rather  than  a  normal  type.  Since  there  are  two  steps  for  allocation, 
there  are  naturally  two  allocation  status  types,  fresh  and  used(n).  To 
meaningfully  implement  linear  allocation,  we  need  an  ordering  on  mem¬ 
ory  labels,  so  we  define  labels  as  natural  numbers.  Whether  a  label  is 
allocated  in  H  can  be  easily  determined  by  comparing  it  with  \H\. 

Operations  on  recursive  types  in  FTAL  are  supported  by  the  fold 
and  unfold  instructions.  The  remaining  instructions  (add,  addi,  bgt, 
mov,  movi,  movl,  Id,  and  st)  are  equivalent  or  similar  to  those  in  the 
original  TAL.  A  code  block  is  a  sequence  of  instructions,  with  spec¬ 
ified  initial  register  types.  Code  blocks  always  end  with  a  jmp  or  jd 
instruction. 


4.2.  Dynamic  semantics 


The  operational  semantics  of  FTAL  is  presented  in  Figure  4.  Most 
of  the  instructions  have  an  intuitively  clear  meaning.  The  Id  and  st  in¬ 
structions  load  from  and  store  to  a  tuple  in  the  heap  using  the  specified 
index.  The  instruction  bgt  rs,rt ,  l  tests  whether  the  value  in  rs  is  larger 
than  that  in  r<,  and  if  so,  transfers  control  to  the  code  block  at  l. 

In  order  to  allocate  a  tuple  in  the  heap,  first  the  alloc  instruction 
is  used  to  copy  the  current  heap  allocation  pointer  to  rr]  and  allocate 
the  desired  size  in  the  heap.  Before  the  next  allocation,  the  alloca¬ 
tion  pointer  needs  to  be  adjusted.  This  is  achieved  using  the  bump 
instruction,  which  sets  the  allocation  pointer  to  the  next  unused  region 
of  the  heap,  as  described  earlier.  (The  i  argument  is  not  used  by  the 
operational  semantics.)  Since  we  assume  a  linear  allocation  method, 
unused  regions  of  the  heap  are  simply  all  those  beyond  the  currently 
allocated  data. 

The  fold  instruction  annotates  the  value  of  rs  with  the  recursive 
type  and  moves  it  into  rd-  The  unfold  instruction  extracts  the  value 
from  the  recursive  package  in  rs  into  rd-  (Note  that  the  fold  and  unfold 
instructions  of  FTAL — as  in  TAL — are  not  no-ops  but  copy  a  value 
from  one  register  to  another.) 
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( H ,  R,  I)  i — >  P  where 

if /  = 

then  P  = 

add  rd,rs,rt]  V 

(H,R{rd^  R(rs)  +  R(rt)},I') 

addi  rd,rs,i;I' 

(H,  R{rd  i— >  R(rs)  +  i},  I') 

alloc  rd[r\;  V 

{H' ,  R{rd  l},  I') 

where  t  =  ri, . . . ,  rn,  i?(r31)  =  l, 

and  H'  =  H{1  (?n, . . . ,  ?rn)} 

bgt  rs,rt,l;I' 

(. H,R,I' )  when  R(rs )  <  and 

(H,  R,  I ")  when  R(rs)  >  R(rt ) 
where  H(l)  =  code[]r./" 

bump  i;  I ' 

(H,R{r31^\H\},I') 

fold  rd[r\,rs;  I' 

( H,R{rd  i—>  fold  R(rs)  as  r},/') 

jd  l 

( H,R,I ')  where  H(l )  =  code[]r./' 

jmp  r 

( H,R,I' )  where  H(R(r))  =  code[]r./' 

Id  rd,rs(i);I' 

(H,  R{rd  e- >  Vi},  I')  where  0  <  i  <  n 
H(R(rs))  =  (v0, . .  .,vn-i) 

mov  rd,  rs;  I' 

(H,R{rd^  R(rs)},P) 

movi  rd,  i\  I' 

(H,R{rd~i},I') 

movl  rd,  l;  I' 

(H,  R{rd  i— > 

st  rd(i),rs\  I' 

( H{1  i— ►  h},R,  I')  where  0  <  i  <  n 
R(rd)  =  l,  H(l)  =  {v0,  •  •  • ,  un_i),  and 
h={vo, . . . ,  Vi- 1,  R(rs),vm, vn-i) 

unfold  rd,rs\I' 

(• H,R{rd^v},I ') 
where  R(rs )  =  fold  v  as  r 

Figure  4-  Operational  semantics  of  FTAL 


4.3.  Static  semantics 


The  primary  judgment  of  the  static  semantics  is  that  of  the  well- 
formedness  of  a  program.  That  in  turn  depends  on  judgments  of  the 
well-formedness  of  the  heap,  heap  type,  register  file,  register  file  type, 
and  instruction  sequence.  The  various  typing  judgments  are  summa¬ 
rized  in  Figure  5. 

The  complete  rules  of  the  FTAL  static  semantics  are  given  in  Fig¬ 
ures  6  to  8. 
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Judgment 

Meaning 

hr 

h'F 

hr 

h  ri  <  r2 

hr!cr2 

r  is  a  well-formed  type 
'k  is  a  well-formed  heap  type 
r  is  a  well- formed  regfile  type 
t\  is  a  subtype  of  r2 

Ti  is  a  regfile  subtype  of  T2 

h? 

h  H-.'H 

<k  hR:T 

^  h Up 
'k  \~h ,:t  hval 

$  hr:r 

'k  h V.T^ 

$;rh/ 

P  is  a  well-formed  program 

H  is  a  well-formed  heap  of  type  \k 

R  is  a  well-formed  regfile  of  type  F 
l  is  a  label  of  allocation  status  p 
h  is  a  well-formed  heap  value  of  type  r 
v  is  a  well-formed  word  value  of  type  t 
v  is  a  well-formed  word  value  of  type 

I  is  a  well-formed  instruction  sequence 

Figure  5.  Static  judgments 


Subtyping  is  used  for  two  purposes:  one  to  allow  a  code  block  to 
be  called  when  the  current  register  file  type  is  more  detailed  than 
needed,  and  the  other  to  be  able  to  type-check  the  initialization  of 
an  uninitialized  tuple  element  as  described  below. 

The  top-level  well-formedness  rules  are  shown  in  Figure  8.  To  have  a 
well-formed  program,  the  heap  and  register  file  must  be  well-formed  in 
some  appropriate  environments,  as  must  be  the  current  instruction  se¬ 
quence.  Additionally,  the  current  instruction  sequence  must  be  present 
in  the  heap.  The  notation  I  C  I'  means  that  I  is  a  suffix  of  I1.  For  a 
heap  to  be  well-formed  the  domain  of  the  heap  type  must  be  the  same 
as  that  of  the  heap,  and  each  heap  value  must  be  well-formed.  However, 
the  type  of  a  well-formed  register  file  need  only  specify  a  subset  of  the 
registers  in  its  domain. 

To  type-check  heap  allocation  and  the  load  and  store  operations, 
we  follow  TAL  by  introducing  initialization  flags  in  the  type  of  tuples. 
When  a  tuple  is  newly  allocated  on  the  heap,  all  the  elements  are 
flagged  with  0.  A  store  operation  will  set  the  flag  of  the  appropriate 
element  to  1.  Thus,  a  load  operation  is  only  well- formed  if  the  flagged 
type  of  the  element  being  accessed  is  set  to  1 .  Because  the  type  system 
only  approximately  tracks  the  initialization  of  tuple  elements,  we  use 
subtyping  to  allow  initialized  tuple  elements  to  be  treated  as  if  they 
were  not  initialized. 
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hr  h4-  hr  hri<r2  hTiCTa 


FTV  (t)  =  0  hr*  (1  <*<n) 

- rr1 -  type  rjn - T  htype 

hr  v  '  h{0:r0,...,n:r„) 


h rj  (l<i<n)  n  <  31 
h{r0:T0, . . .  ,r„:r„,  r31:p} 

hr  hri<r2  hr2<r3 


(reflex) 


h  ti  <  r3 
I- Pi  (l<i<n) 


(rftype) 

(trans) 


1_ /  Vl  Vi-l  1  Vi+i  T‘Pn\<( TV1  TVi-l  U  TVi+i  TVn\ 

n  VI  1  )  'i  >  'i+l  )  ■  •  ■  )  'n  /  —  VI  >  ■  •  •  )  'i-l  >  'i  >  'i+1  >■■•>'«  / 

hr,;  ( m>n )  (0  <i<m) 


(0-1) 


I”  {po  :  To  j  •  •  •  j  ^*m  •  ^ra?  r31 .  /)}  h  {pq  •  P0 >  ■  •  •  )  T*n  •  Pm  ^31  .  pj 


(weaken) 


4fh^:Thval  4Z  hiur  \I/  h  Z :  /?  4/  hwiT*3 


4'  h  Y/.i :  rf 


(1  <i<n) 


V  h w„):(Tr,...,r)f")  hval 


(tuple) 


hr 


4';Th/ 


(int) 


4>  ht):T[pa.r/a] 


4>  hcode[]r./:V[].r  hval 
h4-(0<p 


(code) 


(fold) 


4<  hfold  as  ^a.T-./ia.T  v  7  4rbZ:T 

Z=W  ,  ,  i  =  |®|-l  4/hh(rr,...,P^) 


(label) 


4»  hi: fresh 


(fresh) 


ft  h/:  used(n) 


(used) 


4/  h  l> :  r 
4/  hiur^ 


(init) 


- -  (uninit) 

4/  h?r:r° 


Figure  6.  Well-formedness  of  FTAL  types,  heap  and  word  values 


The  special  allocation  register  is  typed  using  a  new  judgment  of 
allocation  status: 

T  hr  fresh 


(fresh) 


Z  =  |$|- 1  tfhZ:(T? 


Vl 


r^n  \ 


(used) 


ft  hZ :  used(n) 

In  the  first  typing  rule,  a  label  whose  value  is  equivalent  to  the 
size  of  the  heap  type  must  necessarily  be  unallocated,  i.e.  fresh.  When 
allocation  takes  place,  then  the  allocation  register  may  temporarily  be 


paper.tex;  17/04/2002;  17:39;  p.14 


A  Syntactic  Approach  to  Foundational  Proof-Carrying  Code 


15 


$;rh/ 


r(rs)  =  int  r(?’t)  =  int  \^;r{rd:int}  \~ I 


^5 r  badd  rd,rs,rt',I 
T(rs)  =  int  ty-,r{rd:  \nt}  \~I 


(add) 


(addi) 


baddi  rd,rs,i;I 

I ~Tj  'T;  rfr-rf :  (t£ , . . . ,  r^)}{r31 :  used(?i)|  b  / 
H/;  r{ r31 : fresh}  balloc  rd[T\ , . . . ,  r„];  I 

SI/ ;  T{r31 : fresh}  b / 


(alloc) 


'f;  r{r31 :  used(n)}  b  bump  n;  /  ^  ^ 

r(rs)  =  int  r(rt)  =  int  ^(z)=v[].r  brer'  ^;rhi 


bbgt  rs,rt,l;I 


(bgt) 


*;T{rd:T(rs)}FI 


(mov) 


'Ibrlr^int}  b / 


SI/ ;  r  bmov  rd,rs]  I  'P;  T  b  movi  rd,  i\  I 

SI/ ;  r  {r-d :  t-}  b/  h'I'(Z)<T 


(movi) 


$;rh  movi  rd ,  l ;  I 


(movl) 


T(r  'I  _  (TV0  Vi- 1  1  Vi+i 

1  V s)  —  \T0  >  ■  ■  •  1  Ti-1  5  Ti  >  Ti+ 1  >  •  ■  ■  >  Tn-1  / 

;  r {rd  :rj  h/  (0  <  i  <  n) 
r  Hd  rd,  r s(i);  / 

r(rs)=Ti  r(rd)  =  (r(f0,...,T^y1) 

rpd :  ('ro’° » •  •  • .  rlt1 > ^ -  •  •  • ,  ^-T1 ) } h  7 

(0<z<n) 


(LD) 


^ ;  r  b  st  rd(i),  rs;  I 

r(r,s)  =  r[na.T /a\  ^\T{rd:  /la.r}  \- 1 


(ST) 


I- fold  rd[/j,a.T],rs-,I 

r(rs)  =  lia.T  Sf;r{rd:r[/ia.r/a]}  b/ 
SI/ ;  r  b  unfold  rd,  rs;  I 


(fold-i) 


(unfold) 


sfrm=vn.r'  i-rcr  ,  r(r)=vn.r'  brer  / 

-  (JD)  - - -  (JMP) 


s?;r  bjd  i 


SH;  r  bjmp  r 


Figure  7.  Well-formedness  of  FTAL  instruction  sequences 
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b  P  b  H-.'i  H>\-R:T 


b  H:V  H>\-R:T  $;rb  / 

31  £  Dom(H)  ,H(l)  =  code\\T'  .1'  and  I  C  I' 

\~(H,R,I) 


(prog) 


b'L  \p\  =  \H\  V  b hval  (0<1<\H\) 
b  II :  't 


(heap) 


T  b R(n):Ti  (0 <i<n)  Hi  bi?(r31)  :p 
Vr  £  Dom{R)  —  {r31}. if  R(r)  =  l  then  l  <  (  ^ 

z  r  (REG ) 

w  b R:  |r0  : r0, . . . ,  rn : r„,  r31 :  p) 


Figure  8.  Well-formedness  of  FTAL  programs,  heaps,  and  register  files 


pointing  to  the  newly  allocated  memory,  and  thus  will  have  alloca¬ 
tion  status  used(n)  where  n  is  the  length  of  the  allocated  tuple.  The 
assignment  of  allocation  status  interacts  with  the  two  novel  FTAL 
instructions,  alloc  and  bump,  as  shown  in  their  typing  rules: 


b Tj  ^;r{rd:(r10,...,T;Q)}{r31:used(n)}  bJ 
^;T{r31:  fresh}  b  alloc  rd[ri, . . . ,  rn];  I 


(alloc) 


'L;  r{r31 : fresh }  bl 
'L;  T{r31 :  used(n)}  bbump  n;  / 

For  an  alloc  instruction  to  be  well-typed,  the  allocation  register,  r31, 
must  be  in  the  fresh  status,  since  otherwise,  as  can  be  seen  from  the  op¬ 
erational  semantics,  the  previously  allocated  data  will  be  overwritten. 
After  the  alloc  instruction,  the  remainder  of  the  instruction  sequence  is 
checked  with  the  status  of  r31  changed  to  used(n).  No  further  allocation 
can  take  place  until  a  bump  n  instruction  is  encountered,  which  resets 
the  status  to  fresh,  corresponding  again  to  the  update  in  the  operational 
semantics.  (The  need  for  the  n  argument  will  become  clear  later  when 
translating  FTAL  to  the  actual  machine  instructions.) 


4.4.  Examples 

In  this  section,  we  give  a  few  examples  of  FTAL  programs  to  demon¬ 
strate  that  such  a  language  (eventually  extended  with  polymorphism 
and  existentials)  provides  features  which  make  it  suitable  for  compiling 
high-level  languages  such  as  Java,  ML,  or  Safe  C. 
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Our  first  example  is  the  calculation  of  a  Fibonacci  number  in  Fig¬ 
ure  9.  The  C-like  program  at  the  top  of  the  figure  can  be  compiled  to  the 
FTAL  code  below  it.  The  code  segments  fib,  f i b  loop  and  fib_return  form 
a  function,  written  in  CPS,  which  calculates  the  Fibonacci  number  with 
index  given  in  rl,  and  then  passes  control  to  the  continuation  function 
given  in  r30.  The  main  block  calls  fib  to  calculate  F\q  and  passes  the 
address  of  the  halt  block  as  its  continuation,  fib  initializes  the  loop 
variables  and  then  jumps  into  the  loop  code  segment  fibJoop,  which 
jumps  to  fib.return  when  the  calculation  is  done. 

The  second  example,  in  Figure  10,  demonstrates  how  to  use  recursive 
types  and  memory  allocation  to  handle  classes  and  objects.  Class  c 
has  no  data  fields  and  only  one  method  f,  which  takes  an  object  of 
class  c  and  invokes  its  method  f.  In  the  main  program,  an  object  of 
class  c  is  created  and  its  method  f  is  called  with  the  object  itself  as 
argument.  The  program  will  end  up  in  an  infinite  recursive  call  to  c.f. 
In  FTAL,  an  object  of  class  c  is  represented  as  a  recursive  tuple  type 
whose  only  element  is  a  code  block  with  an  only  argument  of  the  object 
type  c.  The  code  block  at  label  c_f  uses  the  unfold  and  Id  instructions 
to  extract  the  argument  object’s  own  method  f,  and  then  jumps  to  it. 
The  constructor  for  c,  inlined  in  the  main  code  block,  uses  the  alloc  and 
bump  instructions  to  allocate  heap  space  for  a  tuple,  then  initializes  its 
method  f  with  the  label  c_f,  and  folds  the  tuple  into  an  object  using 
the  fold  instruction.  Similarly  to  c_f,  the  main  code  block  then  extracts 
method  f  from  the  newly  created  object  and  jumps  to  it. 

4.5.  Soundness 

In  order  to  produce  the  necessary  FPCC  proofs  as  described  in  Sec¬ 
tion  3,  we  must  encode  the  complete  semantics  of  FTAL  in  CiC  along 
with  its  proof  of  soundness,  which  will  be  used  in  defining  and  proving 
the  FPCC  propositions.  The  critical  theorems  for  the  soundness  of 
FTAL  are  the  usual  progress  and  preservation  lemmas: 

Theorem  1  (Progress) 

If  h  P.  then  there  exists  P1  such  that  P  i — >  P' . 

Theorem  2  (Preservation) 

If  h P  and  Pi — ►  P' ,  then  h P’ . 

As  usual,  several  intermediate  lemmas  are  used  to  prove  these  two 
theorems,  all  of  which  can  be  formally  encoded  and  proved  in  the  Coq 
proof  assistant. 
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void  fib  (n:int)  {  //  "Safe  C"  code 

int  a=l,  b=l; 

for  (int  i=2;  i++;  i<=n)  { 
int  c  =  a  +  b;  a  =  b;  b  =  c 

} 

return  a 

> 

int  main  ()  { 
return  fib(10) 

> 

P  =  (H,  {},  I)  //  FTAL  code 

H=  fib:  code [] {rl : int ,  r30  :  V[]  {rl :  int}} . 
mov  r3,  rl; 
movi  rl,  1; 
movi  r2,  1; 
movi  r4,  2; 
jd  fib_loop 

fib_loop:  code [] {rl : int ,  r2 : int ,  r3:int,  r4:int, 
r30 : V [] {rl : int}} . 
bgt  r4,  r3,  fib_return; 
add  r5,  rl,  r2; 
mov  rl,  r2; 
mov  r2,  r5; 
addi  r4 ,  r4 ,  1 ; 
jd  fib_loop 

fib_return:  code  [] {rl : int ,  r30 :V[] {rl : int}} . 
jmp  r30 

halt:  code [] {rl : int} . 
jd  halt 

main:  code[]{}. 

/ 

I  =  movi  rl ,  10 ; 

movi  r30,  halt; 
jd  fib 


Figure  9.  FTAL  Example:  Fibonacci  Numbers 

The  most  important  of  these  lemmas  are  given  below.  Their  encod¬ 
ing  in  Coq  is  described  in  Section  6. 

Lemma  1  (Register  File  Update) 

1 .  If  'I'  b  R :  T  and  \h  b  v :  r  then  ^  b  R{r  >—>■  v}:r{r:r}. 

2.  If  ^  hR:T  and  b l:p  then  ^  bR{r31  Z}  : T{r31 :/?}. 
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class  c  {  //  "Safe  C++"  code 

void  f  (c  x)  {  x.f(x)  } 

> 

void  main  ()  { 
c  x  =  new  c; 
x.f (x) 

> 

P  =  ( H ,  {},  I)  //  FTAL  code 

c  =  /ia.<V[]  {rl  :a}> 

H=  c_f  :  code  []  {rl :  c}  . 
unfold  r2,  rl; 

Id  r2 ,  r2 (0) ; 
jmp  r2 

main:  code[]{}. 

/ 

1=  alloc  rl  [V[]  {rl :  c}]  ; 
bump  1 ; 
movl  r2,  c_f ; 
st  rl(0) ,  r2; 
fold  rl  [c] ,  rl ; 
unfold  r2,  rl; 

Id  r2 ,  r2 (0) ; 
jmp  r2 


Figure  10.  FTAL  Example:  Mini-Object 

Lemma  2  (Canonical  Word  Forms)  If  H :  and  4/  \~v:t  then: 

1.  if  r=int  then  v=i\ 

2.  if  r  =  V[].r  then  v  =  l  and  H(l)  =  code[]r.J; 

3.  if  r  =  (rf1, . . . ,  r%n)  then  v  =  l; 

4.  if  r  =  /ua.r/  then  v  =  fold  v'  as  r. 

Lemma  3  (Canonical  Register  Word  Forms)  If  4/  hi?:r  and 
r(r)  =t  then: 

1.  R(r)=v, 

2.  if  r  =  int  then  R(r)  =  i ; 

3.  if  r=  (rf1, . . . ,  r%n)  then  R(r)  =  1. 

Lemma  4  (Canonical  Heap  Forms)  If  h h:r  hval  then: 
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1.  if  r  =  V[].r  then  /i  =  code[]r.7  and  $;rh/; 

2.  if  r  =  (rf  - 1, . . . ,  T%n)  then  h=  (v\, . . . ,  un)  and  'k  buj :  r^b 

Lemma  5  (Register  File  Weakening)  If  hldCI^  and  'k  bi?:Ti 
then  T  bithrV 

Lemma  6  (Heap  Extension)  If  b i7:\k,  l  =  \H\  (thus, 
l  0  Dom(H)),  and  hr,  then: 

1.  b 

2.  if  'kb v:t'  then  'k{br|  b v\t'\ 

3.  if  \k  \~v.Tip  then  \k{br}  \~v:t^\ 

4.  if  'kjT  b/ then  f{i:r};r  bl; 

5.  if  'k  bi?: T{r31 : fresh}  then  *k{ 1:t }  bi?: T{r31 :  used(n)}; 

6.  if  'k  b  h :  t'  hval  then  \k{br}  b  h:r'  hval; 

7.  if  'k{br}  b h\T  hval  then  b H{1  i— >  /i}:\k{Z:r}. 

Lemma  7  (Heap  Update)  If  biD'k  and  br<\k(Z)  then: 

1.  b^{Z:r}; 

2.  if  \k  b v:t'  then  \k{l:r}  b v:t'\ 

3.  if  'kb v.t^  then  \k{l:r}  b v.t^; 

4.  if  bl  then  ^{/:r};r  bl; 

5.  if  <k  bi?:r  then  ${I:r}  \~  R:T- 

6.  if  'k  b  h :  t'  hval  then  'k{br}  b  h:r'  hval; 

7.  if  'kjbr}  b h\T  hval  then  bi7{/  i— ►  h} 

Now  that  we  have  an  assembly  language  with  a  sound  type  system, 
we  are  ready  to  show  how  to  generate  proof-carrying  code  from  a  well- 
typed  FTAL  program. 

4.6.  Designing  TAL  for  FPCC 

We  have  designed  a  novel  FTAL  language  for  our  presentation  in  this 
article  which  corresponds  closely  to  the  underlying  machine  defined  in 
Section  2.2.  As  will  become  clear  in  the  next  section,  every  well-formed 
FTAL  state  can  be  mapped  to  a  safe  machine  state,  and  this  property 
is  used  to  produce  a  safety  proof  for  the  machine  state. 
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For  safety  policies  which  need  to  enforce  complex  constraints  on 
every  machine  state  or  step,  such  a  one-to-one  mapping  can  be  very  im¬ 
portant.  In  general,  however,  this  strict  correspondence  is  not  necessary 
for  our  syntactic  approach  to  work.  For  example,  if  we  wished  to  retain 
“macro”  instructions  in  the  FTAL  language,  our  FPCC  Preservation 
might  be  modified  to 

US :  State.  Inv  ( S )  -»•  3n :  Nat.  Inv  (Step(n+1)  (5)) 

stating  that  starting  from  a  state  satisfying  the  global  invariant,  the 
machine  will  eventually  (after  one  or  more  steps)  reach  another  state 
satisfying  the  invariant. 

Also,  when  introducing  polymorphism  or  existentials  into  the  FTAL 
language,  there  will  be  certain  FTAL  operations  (e.g.  type  application) 
which  do  not  correspond  to  any  run-time  machine  instructions  at  all. 
In  this  case,  the  FTAL  operation  would  correspond  to  a  “cast”  in  the 
FPCC  proof  for  the  machine  state. 

Another  reason  why  naively  using  existing  typed  assembly  languages 
will  not  necessarily  help  in  producing  FPCC  is  that  the  type  system 
must  be  designed  to  enforce  appropriate  invariants.  There  are  require¬ 
ments  in  the  typing  rules  of  FTAL  which  are  not  critical  for  FTAL 
soundness  but  are  necessary  when  translating  FTAL  to  FPCC  as  de¬ 
scribed  in  the  next  section.  An  example  of  this  is  the  requirement  in  the 
(reg)  rule  (Figure  8)  that  all  labels  in  registers  be  within  the  domain 
of  the  heap  (including  those  registers  that  are  not  specified  in  the  type 
of  the  register  file  and  hence  not  accessible  by  well- formed  code).  This 
condition  is  crucial  in  proving  the  properties  discussed  in  Section  5.3. 


5.  Translating  FTAL  to  FPCC 

As  outlined  in  Section  2.3,  an  FPCC  package  provides  an  initial  state, 
Sq,  and  a  proof  that  the  state  satisfies  the  safety  policy.  In  the  next  few 
subsections,  we  show  how  to  translate  an  FTAL  program  into  a  ma¬ 
chine  state  and  how  to  use  the  FTAL  type  system  to  generate  proofs  of 
the  FPCC  Preservation  and  Progress  propositions,  which  imply  safety. 

5.1.  From  FTAL  to  machine  state 

FTAL  programs  are  compiled  to  machine  code  by  (1)  defining  a  layout 
for  the  memory,  which  maps  heap  values  of  the  program  to  memory 
addresses,  (2)  translating  FTAL  instructions  to  machine  instructions, 
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and  (3)  choosing  the  appropriate  program  counter  and  register  values. 
The  layout  must  ensure  that  there  are  no  overlaps  between  the  images 
of  tuples  and  code  sequences  in  the  memory.  Our  choice  of  the  FTAL 
instruction  set  allows  us  to  translate  every  FTAL  instruction  into  one 
machine  instruction  word. 

We  will  express  the  correspondence  between  an  FTAL  program  and 
a  machine  state  by  a  family  of  translation  relations  upon  the  various 
syntactic  categories.  The  forms  of  these  relations  are: 


Relation 

Correspondence 

(H,  R,  I)  =>  ( M,R,pc ) 

FTAL  program  to  machine  state 

LY-  H  =>  M 

FTAL  heap  to  memory 

LY-  R^R 

register  files 

L  h  I  =>s  } 

sequence  of  instructions  to 
memory  layout 

L  h  t  =t  ro 

instruction  translation 

L  Y-  h  =^h  M[i..j] 

heap  value  to  memory  layout 

LY-  v  =^ww 

word  value  to  machine  word 

An  important  step  in  the  translation  is  flattening  the  FTAL  heap 
into  the  machine  memory.  To  support  this,  we  define  a  Layout  function 
of  type  Heap  — >  Label  — ►  Word  which,  given  an  FTAL  heap,  returns  a 
mapping  from  labels  to  memory  addresses.  (In  the  relations  above,  L 
is  this  Layout  function  applied  to  the  heap.)  For  our  current  purpose, 
we  define 


Layout ({})  (l')  =0 

Layouts  «  h})  ((')  =  '  l[ 

otherwise, 

where  w  =  Layout  (H)  ( l ') 

where  size  (h)  is  the  size  of  the  heap  value  h  (n  for  an  n-tuple,  for  a  code 
block  -  the  length  of  the  instruction  sequence).  This  Layout  function 
maps  labels  to  addresses  starting  at  0  and  forces  the  translation  =>■ 
to  lay  out  FTAL  heap  values  compactly,  consecutively,  and  with  no 
overlapping  (due  to  the  implicit  constraint  that  the  labels  in  the  heap 
appear  in  descending  order) .  Additionally,  the  first  unused  label  (whose 
value  equals  the  size  of  the  heap)  is  mapped  to  the  first  unused  address. 
These  properties  of  the  Layout  function  are  useful  later  on  in  proving 
Preservation  and  Progress. 

Recall  that  the  machine  memory  is  modeled  as  a  function,  Word  — > 
Word,  so  M(w)  denotes  the  memory  word  at  address  w.  The  judgments 
Lh/=ts  and  Lh/i4h  state  that  a  sequence  of  instruc- 
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tions  and  a  heap  value  (either  a  tuple  or  a  code  block),  respectively, 
translate  to  a  series  of  consecutive  words  in  memory  M  from  address  i 
to  address  j. 

The  translation  relations  are  defined  by  a  set  of  inference  rules, 
given  in  Figure  11.  The  rules  are  straightforward  and  operate  purely 
on  the  syntax  of  FTAL  programs.  Note  that  FTAL  type  annotations 
are  discarded  in  the  translation  (for  example,  in  the  fold  instruction), 
and  label  word  values  are  mapped  to  memory  words  using  the  layout 
function.  Each  FTAL  heap  value  corresponds  to  a  sequence  of  words 
in  memory.  A  heap  translates  to  a  memory  if  every  heap  value  in  the 
heap  translates  to  the  appropriate  sequence  of  memory  words.  Registers 
translate  directly  between  FTAL  and  the  machine.  An  FTAL  program 
corresponds  to  a  machine  state  if  the  translation  relation  holds  on  the 
heap  and  register  file,  and  if  the  current  instruction  sequence  is  at 
some  location  in  the  memory.  Since  in  a  well-typed  FTAL  program  the 
current  instruction  sequence  must  also  be  present  in  the  heap,  we  can 
always  translate  it  to  a  known  program  counter.  Notice  that  the  FTAL 
alloc  and  bump  instructions  correspond  to  machine  move  and  addition 
instructions,  respectively,  using  the  register  reserved  for  allocation,  r31. 

The  translation  relation  as  presented  in  Figure  11  is  also  not  de¬ 
terministic  with  respect  to  the  unused  and  uninitialized  parts  of  the 
memory  and  to  the  positioning  of  the  program  counter.  However,  it  is 
straightforward  on  the  basis  of  its  definition  to  develop  a  determinis¬ 
tic  function  which  translates  an  FTAL  program  into  a  machine  state 
for  which  the  translation  relation  described  above  holds.  In  the  next 
section,  we  will  show  how  this  initial  translation  is  used  to  provide  the 
Initial  Condition  FPCC  proof. 

5.2.  The  global  invariant 

As  discussed  in  Section  3,  in  addition  to  translating  the  FTAL  program 
to  an  initial  machine  state  So,  we  must  define  the  invariant  Inv,  which 
holds  during  the  execution  of  a  machine  program,  and  provide  proofs 
of: 

Initial  Condition:  Inv  (So) 

Preservation:  nS1 :  State.  Inv  ( S )  —  Inv  (Step  ( S )) 

Progress:  nS1 :  State.  Inv  ( S )  — >  SP  ( S ) 

The  invariant  simply  has  to  ensure  that  the  machine  state  at  each 
step  corresponds  to  a  well-typed  FTAL  program,  which  will  allow  us  to 
use  the  formalized  versions  of  the  proofs  of  the  progress  and  preserva¬ 
tion  lemmas  for  FTAL  to  generate  formal  proofs  of  the  corresponding 
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Instruction  Sequences 

L  b  l  =>.  Dc(M(z))  L  h  I  M\(i  +  l)..j] 

L  I-  t;  I  =>s  M[z..j] 

Dc(M(z))  =  jd  (L(T))  Dc(M(z))  =  jmp  r 
L  h  jd  l'  =4>  M[z..z]  L  h  jmp  r 

Heap  Values 

L  h  Vi  =$►,  M(j  +  z)  for  0  <i  <  n  L\-  I  =%  M[i..j] 

L  h  (v0, . . . ,  vn)  =>h  +  zz)]  L  h  code  [}T.I  =>h  M[i..j] 

Heap,  Register  File,  Program 

L  h  H(l)  =>h  M[L(l)..L(l+ 1)  — 1]  for  0  <  l  <  \H\  L  h  R(f)  =>w  R(r) 

L\-  H  =>  M  L\-  R=>R 

LayoutfH)  h  H  =>  M  Layout(H)  h  /  =>s  M[pc..pc  +  |/|  -  11, 

LayoutfH)  h  R  =>  R  where  31  £  Dom(H).(H(l )  =  code  [Jr./',  /  C  and 

pc  =  Layout(H)(l)  +  \I'\  —  |/|) 

_ (H,R,I)  =»  (M,R,pc) _ 

Figure  11.  Relating  FTAL  programs  to  machine  states 


paper.tex;  17/04/2002;  17:39;  p.24 


A  Syntactic  Approach  to  Foundational  Proof-Carrying  Code 


25 


properties  of  the  invariant.  Since  the  definition  of  Inv  requires  us  to 
state  that  an  FTAL  program  is  well-typed,  it  must  be  expressed  not 
just  in  terms  of  FTAL  programs,  but  of  their  typing  derivations: 

Inv(S)  =  3P  :  program.  3D  :  (\~P).  P  =>  S 

where  the  type  annotation  bP  in  the  quantification  on  D  introduces 
D  as  a  proof  term  for  the  judgment  b  P. 

The  proof  of  the  initial  condition  can  now  be  obtained  directly  in  the 
process  of  translating  an  initial  well- formed  FTAL  program  to  machine 
state  as  described  in  Section  5.1.  It  remains,  therefore,  to  prove  the  two 
lemmas. 

5.3.  The  Preservation  and  Progress  properties 

Progress  in  our  case  is  easy  to  prove:  since  the  invariant  states  that 
there  exists  a  well- typed  FTAL  program  which  translates  to  the  current 
state,  it  is  obvious  by  examination  of  the  translation  rules  that  such  an 
FTAL  program  will  never  translate  to  a  state  in  which  the  program 
counter  points  to  an  illegal  instruction. 

The  remaining  proof  term,  for  Preservation,  is  thus  the  most  in¬ 
volved  of  the  generated  FPCC  proofs.  It  is  obtained  in  the  following 
way: 

Given  a  program  P  and  a  typing  derivation  for  b  P.  we  know  by 
FTAL  progress  that  there  exists  a  program  P'  such  that  P  i — >  P' . 
Furthermore,  by  FTAL  preservation,  we  know  that  b  P' .  Now,  the 
premise  of  our  FPCC  Preservation  theorem  provides  us  with  a  ma¬ 
chine  state  S  such  that  P  =>  S,  and  we  need  to  show  that  there  exists 
another  well- typed  program  that  translates  to  Step  (S').  The  semantics 
of  FTAL  has  been  set  up  so  that  this  well-typed  program  is  exactly 
P' .  It  remains  now  for  us  to  prove  that  indeed  P'  =>  Step  (S'),  as 
diagrammed  in  Figure  12. 

Essentially,  we  need  to  show  that  the  FTAL  evaluation  relation 
corresponds  to  the  machine’s  step  function.  This  is  proved  by  induction 
on  the  typing  derivation  of  b  P.  For  each  possible  case,  we  use  inversion 
on  the  structure  of  P,  the  FTAL  evaluation  relation,  the  translation 
relation,  and  the  machine  Step  function  to  gain  the  necessary  informa¬ 
tion  about  the  structure  of  P' ,  S,  and  Step  (S).  Many  of  the  cases  of 
this  proof  are  fairly  straightforward. 

Let  us  briefly  consider  one  of  the  interesting  cases  of  the  Preservation 
proof,  which  is  when  the  current  instruction  is  alloc.  Corresponding  to 
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Figure  12.  Relationship  between  FTAL  evaluation  and  machine  semantics 


the  diagram  in  Figure  12,  we  have  the  following  setup: 

P  =  (77,7?,  alloc  rd[n, . . .  ,rn];7) 
P'=(H',R',I ) 

5  =  ( M,R,pc ) 

Step  (5)  =  ( M,R,',  (pc+  1)) 


where  77',  7?',  and  7?;  can  be  determined  by  the  operational  semantics 
of  FTAL  and  the  definition  of  the  Step  function  (Figure  2). 

We  now  need  to  prove  that  P'  is  related  to  Step  (S)  by  the  trans¬ 
lation.  First,  we  know  by  the  properties  of  the  layout  function  that 
applying  it  to  an  extended  heap  maintains  the  mapping  of  all  the 
existing  labels  in  the  old  heap.  Now,  the  FTAL  heap  is  updated  after 
evaluation  but  the  memory  stays  the  same  after  the  step.  However, 
since  the  update  to  the  heap  is  only  with  uninitialized  values  which 
can  be  translated  to  any  word,  the  translation  will  still  hold  on  the  un¬ 
changed  memory.  Thus,  we  can  show  that  the  updated  heap  translates 
to  the  unaltered  memory.  Then,  relating  the  two  updated  register  files 
is  not  difficult,  nor  is  showing  that  the  residual  instruction  sequence 
corresponds  to  the  next  program  counter  value.  Well-formedness  of  P 
( i.e .  b P)  is  used  in  various  steps  of  this  proof,  for  instance,  to  reason 
that  any  labels  in  the  registers  are  within  the  domain  of  the  heap,  hence 
the  layout  function  on  the  updated  heap,  77',  preserves  the  mappings 
of  existing  labels. 

This  completes  the  translation,  or  compilation,  of  a  well-typed  FTAL 
program  to  an  FPCC  code  package.  The  FTAL  program  can  be  shown 
to  correspond  to  an  initial  machine  state  and  that  state  can  be  shown 
safe  (as  described  in  Sections  2.3  and  3)  using  the  proofs  of  Preservation 
and  Progress  developed  here. 
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6.  Implementation 

An  implementation  of  the  syntactic  approach  presented  in  this  article 
consists  of  an  FTAL  compiler  which  generates  FPCC  packages.  An 
FPCC  package  consists  of  two  parts:  the  initial  machine  state  and 
the  proof  of  safety.  The  proof  of  safety  can  be  further  divided  into 
two  pieces:  one  is  the  proof  of  the  Preservation  and  Progress  theorems 
and  the  other  is  the  proof  that  the  initial  machine  state  satisfies  the 
Initial  Condition  property.  Note  that  the  proofs  of  Preservation  and 
Progress  do  not  change  for  any  machine  state  which  has  been  generated 
by  compiling  an  FTAL  program.  Thus,  these  properties  need  only  be 
proven  once  and  can  then  be  reused  for  all  FPCC  packages  produced 
by  this  compiler. 

In  the  following  sections,  we  first  describe  our  Coq  representation  of 
the  machine  and  the  encoding  of  FTAL  syntax  and  semantics  and 
soundness  theorems.  Next  we  discuss  implementation  of  the  formal 
proofs  of  FPCC  Preservation  and  Progress,  which  were  done  interac¬ 
tively  using  the  Coq  proof  assistant.  Then,  we  describe  a  compiler  which 
parses  an  FTAL  program,  performs  type-checking,  and  automatically 
produces  the  Coq  term  representing  the  typing  derivation.  This  typing 
derivation  is  then  used  to  construct  the  proof  of  the  Initial  Condition 
property. 

Coq  is  a  proof  assistant  tool  for  the  calculus  of  inductive  construc¬ 
tions.  It  provides  an  interactive  interface  for  constructing  formal  proofs 
in  the  logic.  The  Coq  syntax  for  A- abstraction,  AX  :A.B,  is  [X:A]B. 
The  syntax  for  dependent  products,  IIX  :A.B,  is  (X:A)B  and  Coq  al¬ 
lows  for  the  normal  arrow  abbreviation  of  this  when  the  bound  variable 
does  not  occur  in  the  body,  e.g.  A->B.  Coq  syntax  for  inductive  defini¬ 
tions  is  exactly  that  described  in  Section  2.1.  Coq  uses  the  sort  Prop 
for  logical  propositions  and  the  sort  Set  for  the  type  of  specifications 
(booleans,  natural  numbers,  lists,  programs,  etc.). 

6.1.  Encoding  machine  semantics 

The  Coq  encoding  of  the  machine  to  which  FTAL  programs  are  trans¬ 
lated  is  very  similar  to  the  presentation  in  Section  2.2.  For  example, 
having  defined  the  registers  as  an  inductive  set  with  32  constructors, 
we  then  define  the  memory  and  register  file  as  being  functions  and  the 
state  as  a  triple  of  memory,  register  file,  and  program  counter: 

Definition  Word  :=  nat . 

Inductive  _Reg  :  Set  :=  _r0  :  _Reg  I  _rl  :  _Reg  I  ... 
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Definition  Mem  :=  Word  ->  Word. 

Definition  _RegFile  :=  _Reg  ->  Word. 

Definition  State  :=  (Mem  *  (_RegFile  *  Word)). 


The  instruction  set  is  then  defined  as  an  inductive  definition  with 
appropriate  constructors: 


ictive 

.Instr 

:  Set 

_add 

-Reg 

-> 

_Reg  -> 

_Reg  -> 

_Instr 

_addi 

-Reg 

-> 

_Reg  -> 

Word  -> 

_Instr 

_movi 

-Reg 

-> 

Word  -> 

_Instr 

-bgt 

-Reg 

-> 

_Reg  -> 

Word  -> 

_Instr 

-jd 

Word 

-> 

_Instr 

-jmP 

-Reg 

-> 

_Instr 

_ld 

-Reg 

-> 

_Reg  -> 

Word  -> 

_Instr 

_st 

-Reg 

-> 

Word  -> 

_Reg  -> 

_Instr 

_ill 

_Instr . 

We  next  decide  on  how  to  encode  the  instructions  above  as  nat¬ 
ural  numbers  and  write  a  Coq  function  which  uses  the  appropriate 
arithmetic  operations  to  decode  a  natural  number  into  an  _Instr: 

Definition  Dc  :  Word  ->  _Instr  :=  ... 

We  are  now  ready  to  encode  the  semantics  of  the  machine  as  given 
in  Section  2.2.  For  updating  the  register  file  and  memory,  we  define 
auxiliary  functions,  as  in  the  code  below: 

Definition  updateregf ile 
:  _RegFile  ->  _Reg  ->  Word  ->  _RegFile 
:=  [R:_RegFile;  rd:_Reg;  v:Word] 

([r:_Reg]  if  (beq_reg  r  rd)  then  v  else  (R  r)). 

Definition  Step  :  State  ->  State 
:=  [St: State]  Cases  St  of  (M,  (R,  pc))  => 

Cases  (Dc  (M  pc))  of 
(_add  rd  rs  rs ’ ) 

=>  (M,  ( (updateregf ile  R  rd 

(plus  (R  rs)  (R  rs’))), 

(S  pc))) 

I  (_jd  1) 

=>  (M,  (R,  1)) 

I  ... 

I  _ill  =>  St 
end 
end. 
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Finally,  we  can  state  the  safety  policy  we  wish  to  enforce  and  define 
what  a  safe  machine  state  is.  The  MultiStep  function  simply  applies 
the  Step  function  to  the  given  state  n  times: 

Definition  SP  [S: State] 

:=  (let  (M,T’)=S  in 

(let  (R,PC)=T’  in 

~(Dc  (M  PC))=_ill)) . 

Definition  Safe  [S:State] 

:=  (n:nat)(SP  (MultiStep  n  S)). 


6.2.  Encoding  FTAL  syntax 


Encoding  the  FTAL  language  is  a  more  involved  process.  We  start  by 
defining  each  syntactic  category  as  an  inductive  type.  For  example,  the 
FTAL  types  are  encoded  as  follows: 

Definition  initflag  :=  bool. 

Inductive  Omega  :  Set 
: =  intty  :  Omega 

I  codety  :  (Map  Reg  Omega)  ->  APTy  ->  Omega 
I  tupty  :  (list  Omega)  ->  (list  initflag)  ->  Omega 
I  recty  :  (OmegaL  (S  0))  ->  Omega. 

The  list  in  the  tuple  type  constructor  is  the  usual  definition  of  a 
list,  found  in  the  Coq  library.  Hence,  the  tuple  type  constructor  takes 
as  arguments  a  list  of  types  and  a  list  of  initialization  flags  (booleans). 
Map  is  defined  as  a  list  of  pairs.  The  type  of  a  register  file  (used  by 
codety)  is  a  map  from  registers  (definition  presented  below)  to  types. 
We  also  define  a  “well-formed  Map”,  used  later,  as  being  a  list  of  pairs 
in  which  the  first  element  of  every  pair  in  the  list  is  distinct  from  all 
others. 

A  well-formed  type  in  the  FTAL  language  will  never  have  free  type 
variables,  but  variables  may  appear  in  a  recursive  type.  Hence,  we 
represent  the  type  under  the  recursive  type  constructor  by  a  “lifted” 
version  of  Omega  which  uses  deBruijn  indices  to  represent  variables. 
The  parameter  of  the  OmegaL  type  below  tracks  the  number  of  free 
type  variables  in  the  term  to  ensure  the  correctness  of  our  substitution 
and  unfolding  functions  for  recursive  types: 
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Inductive  OmegaL  :  nat  ->  Set 
:=  inttyL  :  (OmegaL  0) 

I  codetyL  :  (i:nat)  (Map  Reg  (OmegaL  i))  -> 

APTy  ->  (OmegaL  i) 

I  tuptyL  :  (i:nat)  (list  (OmegaL  i))  -> 

(list  initflag)  ->  (OmegaL  i) 

I  rectyL  :  (i:nat)  (OmegaL  (S  i))  ->  (OmegaL  i) 

I  varL  :  (i:nat)  (OmegaL  (S  i)) 

I  liftL  :  (i:nat)  (OmegaL  i)  ->  (OmegaL  (S  i)). 

Registers  are  defined  as  in  the  machine  above.  Unlike  the  presen¬ 
tation  in  previous  sections,  we  carry  the  special  allocation  pointer 
separately  from  the  rest  of  of  the  register  file,  hence  there  are  only 
31  registers  defined  for  FTAL.  The  r31  register,  or  AP  below,  is  simply 
a  label  (which  is  defined  to  be  a  natural  number) .  The  special  allocation 
pointer  types  are  encoded  as  an  inductive  definition  and  the  types  of 
register  files  and  heaps  are  maps  from  registers  or  labels,  respectively, 
to  Omega  (the  heap  type  also  requires  that  the  map  be  well-formed,  as 
defined  above): 

Inductive  Reg  :  Set  :=  rO  :  Reg  I  rl  :  Reg  I  ... 

Definition  label  :=  nat. 

Definition  AP  :=  label.  (*  alloc,  ptr.  (r31)  *) 

Inductive  APTy  :  Set 
:=  fresh  :  APTy 
I  used  :  nat  ->  APTy. 

Definition  RegFileTy  :=  (Map  Reg  Omega). 

Definition  HeapTy  :=  (WFMap  label  Omega). 

The  remainder  of  the  definitions  for  FTAL  syntax  are  fairly  intuitive 
and  match  closely  the  presentation  in  Figure  3,  except  that  r31  and  its 
type  are  carried  separately  as  AP  and  APTy: 

Inductive  Instr  :  Set 


add 

Reg  -> 

Reg  -> 

Reg  -> 

Instr 

addi 

Reg  -> 

Reg  -> 

int  -> 

Instr 

alloc 

Reg  -> 
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>  Instr 

bgt 

Reg  -> 
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>  Instr 

bump 

int  -> 
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fold 

Reg  -> 
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>  Instr 

Id 

Reg  -> 
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mov 

Reg  -> 
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Inductive  InstrSeq  :  Set 
:=  iseq  :  Instr  ->  InstrSeq  ->  InstrSeq 
I  jd  :  label  ->  InstrSeq 
I  jmp  :  Reg  ->  InstrSeq. 

Inductive  WordVal  :  Set 
:=  wl  :  label  ->  WordVal 

I  wi  :  int  ->  WordVal 

I  wuninit  :  Omega  ->  WordVal 
I  wfold  :  WordVal  ->  Omega  ->  WordVal. 

Inductive  HeapVal  :  Set 
:=  tuple  :  (list  WordVal)  ->  HeapVal 
I  code  :  RegFileTy  ->  APTy  ->  InstrSeq  ->  HeapVal. 

Definition  Heap  :=  (WFMap  label  HeapVal). 

Definition  RegFile  :=  (Map  Reg  WordVal). 

Definition  Program  :=  (Heap  *  (RegFile  *  (AP  *  InstrSeq))). 


6.3.  Encoding  FTAL  semantics  and  soundness 

Each  judgment  form  of  the  dynamic  and  static  semantics  can  be  viewed 
as  a  relation  and  is  also  encoded  as  an  inductive  definition.  For  ev¬ 
ery  evaluation  or  typing  rule,  there  is  an  associated  constructor  of 
the  appropriate  inductive  definition.  (This  allows  us  to  use  Coq’s  in¬ 
ductive  elimination  constructs  to  perform  inversion  and  induction  on 
typing  derivations.)  We  show  the  encoding  of  several  evaluation  rules 
in  Figure  13. 

The  reglookup  and  regupdext  are  to  be  read  as  propositions  stating 
that  looking  up  the  value  of  a  given  register  in  a  register  file  (which 
is  defined  a  Map)  yields  the  given  word  value  and  that  updating  or 
extending  the  mapping  of  a  register  in  a  register  file  results  in  a  new 
register  file,  respectively.  For  the  heap  (and  similarly  heap  type,  which 
are  both  defined  as  well-formed  Maps)  the  hextend  proposition  requires 
that  the  label  being  added  to  the  domain  of  the  heap  is  not  already  be¬ 
ing  mapped  in  the  heap.  The  hupdate  proposition  only  holds  true  when 
the  label  is  in  fact  present  in  the  heap  mapping.  These  propositions  are 
defined  inductively  as  relations  on  Maps. 

The  encodings  of  the  main  static  judgments  are  given  in  Figures  14 
and  15. 

In  order  to  formally  prove  the  soundness  of  FTAL  as  encoded 
above,  we  proceed  by  first  proving  the  same  lemmas  that  are  listed 
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in  Section  4.5.  The  statements  of  these  lemmas  in  Coq,  while  slightly 
verbose,  are  essentially  the  same  as  those  listed  in  the  section  above.  We 
generate  the  proofs  of  these  lemmas  interactively  using  Coq  proof  “tac¬ 
tics.”  The  tactics  of  the  proof  assistant  correspond  much  to  the  steps 
that  would  be  used  in  a  hand  proof,  e.g.  induction,  inversion,  rewriting, 
application  of  rules  (constructors),  etc.  We  present  the  statements  of  a 
few  of  these  lemmas  in  Coq  below  (Register  File  Update,  the  second 
case  of  Canonical  Word  Forms,  and  several  cases  of  the  Heap  Extension 
lemma) : 


Lemma  regf ile_update 

:  (HT:HeapTy;  R,R’ :RegFile;  G,G’ : (Map  Reg  Omega)) 

(rd:Reg;  v:WordVal;  t : Omega) 

(WFRegFile  HT  R  G)  -> 

(WFWordVal  HT  v  t)  -> 

(regupdext  R  rd  v  R’)  -> 

(regupdext  G  rd  t  G’)  -> 

(WFRegFile  HT  R’  G’). 

Lemma  can_word_f orms_code 

:  (H:Heap;  HT : HeapTy ;  v:WordVal;  G:RegFileTy;  T : APTy) 
(WFHeap  H  HT)  -> 

(WFWordVal  HT  v  (codety  G  T))  -> 

(EX  1  |  v=(wl  1)  /\  (EX  I  |  (hlookup  H  1  (code  G  T  I)))). 

Lemma  heap_ext_2 

:  (H,H’:Heap;  HT , HT HeapTy ;  t:0mega;  1: label) 

(v : WordVal ;  t ’ : Omega) 

(WFHeap  H  HT)  -> 

(hsize  H  1)  -> 

(htextend  HT  1  t  HT’)  -> 

(WFWordVal  HT  v  t’)  -> 

(WFWordVal  HT’  v  t’). 

Lemma  heap_ext_4 

:  (I:InstrSeq) 

(H,H’:Heap;  HT , HT ’: HeapTy ;  t: Omega;  1: label) 

(R:RegFileTy ;  A: APTy) 

(WFHeap  H  HT)  -> 

(hsize  H  1)  -> 

(htextend  HT  1  t  HT’)  -> 

(WFInstrSeq  HT  R  A  I)  -> 

(WFInstrSeq  HT’  R  A  I). 

Lemma  heap_ext_7 

:  (H,H’:Heap;  HT , HT ’: HeapTy ;  t:0mega;  1: label) 

(h:HeapVal) 

(WFHeap  H  HT)  -> 

(hsize  H  1)  -> 
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(htextend  HT  1  t  HT’)  -> 

(hextend  H  1  h  H’)  -> 

(WFHeapVal  HT’  h  t)  -> 

(WFHeap  H’  HT’). 

The  main  theorems  for  the  soundness  of  FTAL,  preservation  and 
progress,  follow  from  the  various  lemmas: 

Theorem  ftal_preserv 

:  (P,P’ : Program)  (WFProgram  P)  ->  (Eval  P  P’)  ->  (WFProgram  P’). 
Theorem  ftal_progress 

:  (P: Program)  (WFProgram  P)  ->  (EX  P’  I  (Eval  P  P’)). 

We  have  now  completely  formalized  the  (syntactic)  soundness  proof 
of  FTAL.  In  the  next  section,  we  discuss  the  encoding  of  the  translation 
relations  between  FTAL  and  the  machine,  and  how  FTAL  soundness 
is  used  to  produce  the  proofs  of  the  FPCC  Preservation  and  Progress 
theorems. 

6.4.  Encoding  FPCC  Preservation  and  Progress 

The  translation  relations  (not  shown  here)  are  represented  as  a  set  of  in¬ 
ductive  definitions  which  follow  precisely  the  presentation  in  Figure  11, 
for  example, 

Inductive  TrProgram 
:  Program  ->  State  ->  Prop  :=  ... 

The  global  invariant  for  FPCC  can  be  defined  in  terms  of  the  trans¬ 
lation  between  a  well-formed  FTAL  program  and  the  machine  state: 
Definition  Inv  [S: State] 

:=  (EXT  P: Program  I 

(EXT  D: (WFProgram  P)  I 
(TrProgram  P  S))). 

Now  we  proceed  to  prove  the  FPCC  Progress  theorem: 

Theorem  Progress  :  (S: State)  (Inv  S)  ->  (SP  S) . 

As  mentioned  in  Section  5.3,  the  Progress  theorem  is  straightfor¬ 
ward.  Using  several  Coq  “Inversion”  tactics,  we  determine  that  there 
exists  a  well-formed  instruction  sequence  which  translates  to  the  pro¬ 
gram  counter  of  the  state.  Then  we  perform  case  analysis  on  the  well- 
formed  instruction  sequence  judgment  and  show  that  in  every  pos¬ 
sible  case,  the  program  counter  of  the  state  must  be  pointing  to  a 
non-illegal  instruction. 

Next  is  the  FPCC  Preservation  theorem,  which  is  more  involved  to 
prove  but  which  follows  the  discussion  in  Section  5.3: 
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Inductive  Eval  :  Program  ->  Program  ->  Prop 

:=  ev_add 

:  (HrHeap;  R,R5 :RegFile ;  r31:AP;  1 5 : InstrSeq) 

(rd , rs , rs 5 : Reg ;  rsval , rsval 5 : int ) 

(reglookup  R  rs  (wi  rsval))  -> 

(reglookup  R  rs’  (wi  rsval5))  -> 

(regupdext  R  rd  (wi  (plus  rsval  rsval5))  R5)  -> 
(Eval  (H, (R, (r31 , (iseq  (add  rd  rs  rs5)  I5)))) 
(H, (R} , (r31 , 1 5 ) ) ) ) 

I  ev_alloc 

:  (H,H5:Heap;  R,R5 : RegFile ;  r31:AP;  I5:InstrSeq) 
(rd:Reg;  V:(list  Omega)) 

(regupdext  R  rd  (wl  r31)  R5)  -> 

(hextend  H  r31  (tuple  (makeUninitTup  V))  H5)  -> 
(Eval  (H,  (R,  (r31,  (iseq  (alloc  rd  V)  I5)))) 
(H5,  (R5,  (r31,  I5)))) 

I  ev_bump 

:  (HrHeap;  RrRegFile;  r31:AP;  1 5 : InstrSeq) 

(irint;  lrnat) 

(hsize  H  1)  -> 

(Eval  (H,  (R,  (r31,  (iseq  (bump  i)  I5)))) 

(H,  (R,  (1,  I5)))) 

I  ev_jd 

:  (HrHeap;  RrRegFile;  r31:AP) 

(1: label;  GrRegFileTy;  TrAPTy;  1 5 : InstrSeq) 
(hlookup  H  1  (code  G  T  I5))  -> 

(Eval  (H,  (R,  (r31 ,  (jd  1)))) 

(H,  (R,  (r31 ,  I5)))) 

I  ev_movl 

:  (HrHeap;  R,R’ rRegFile;  r31:AP;  1 5 : InstrSeq) 
(rdrReg;  1: label) 

(regupdext  R  rd  (wl  1)  R5)  -> 

(Eval  (H,  (R,  (r31,  (iseq  (movl  rd  1)  I5)))) 

(H,  (R\(r31,  I5)))) 

I  ev_store 

:  (H,H5  rHeap;  RrRegFile;  r31:AP;  1 5 : InstrSeq) 
(rd,rs:Reg;  irint;  1: label; 

V,V5 : (list  WordVal);  wrWordVal) 

(reglookup  R  rd  (wl  1) )  -> 

(reglookup  R  rs  w)  -> 

(hlookup  H  1  (tuple  V))  -> 

(updatetuple  V  i  w  V5)  -> 

(hupdate  H  1  (tuple  V5)  H5)  -> 

(Eval  (H,  (R,  (r31,  (iseq  (st  rd  i  rs)  I5)))) 
(H5 , (R,  (r31,  I5)))) 


Figure  13.  Coq  encoding  of  FTAL  dynamic  semantics 


Theorem  Preservation  :  (S: State)  (Inv  S)  ->  (Inv  (Step  S)). 

With  these  two  theorems,  we  can  now  prove  that  a  machine  state  will 
be  safe  if  the  FPCC  Initial  Condition  property  is  satisfied: 

Theorem  Safety  :  (S: State)  (Inv  S)  ->  (Safe  S) . 
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Inductive  RegFileSubtype  (*  register  file  subtyping:  G  <=  G’  *) 

:  RegFileTy  ->  RegFileTy  ->  Prop 
:=  weaken 

:  (G,GJ : RegFileTy) 

((r:Reg;  t: Omega)  (reglookup  G’  r  t)  ->  (reglookup  G  r  t))  -> 
(RegFileSubtype  G  G’). 

Inductive  WFWordVal  (*  well-formed  word  values:  HT  |-  w  :  t  wval  *) 

:  HeapTy  ->  WordVal  ->  Omega  ->  Prop 

:=  int_wval  :  (HT:HeapTy;  i : int) (WFWordVal  HT  (wi  i)  intty) 

I  label_wval 

:  (HT: HeapTy;  1: label;  t,t’: Omega) 

(ht lookup  HT  1  t’)  -> 

(Subtype  t’  t)  -> 

(WFWordVal  HT  (wl  1)  t) 

I  f old_word_wval 

:  (HT: HeapTy;  w: WordVal;  t:0megaR;  t’: Omega) 

(RUnlift  (RUnf old  t))=t>  -> 

(WFWordVal  HT  w  tJ)  -> 

(WFWordVal  HT  (wfold  w  (recty  t))  (recty  t)). 

Inductive  WFInstrSeq  (*  well-formed  instruction  sequences:  HT;  G  |-  I  *) 
:  HeapTy  ->  RegFileTy  ->  APTy  ->  InstrSeq  ->  Prop 
:=  s_add 

:  (HT:HeapTy;  G,G 5 : RegFileTy;  T:APTy;  I : InstrSeq) 

(rd,rs ,rs ’ :Reg) 

(reglookup  G  rs  intty)  -> 

(reglookup  G  rs5  intty)  -> 

(regupdext  G  rd  intty  G’)  -> 

(WFInstrSeq  HT  G’  T  I)  -> 

(WFInstrSeq  HT  G  T  (iseq  (add  rd  rs  rs’)  I)) 

I  s_alloc 

:  (HT:HeapTy;  G,G} : RegFileTy;  I : InstrSeq) 

(rd:Reg;  n:nat;  V:(list  Omega)) 
n= (length  V)  -> 

(regupdext  G  rd  (tupty  V  (makeUninitTupty  V))  G’)-> 

(WFInstrSeq  HT  G’  (used  n)  I)  -> 

(WFInstrSeq  HT  G  fresh  (iseq  (alloc  rd  V)  I)) 

I  s_jd 

:  (HT : HeapTy;  G,G’ : RegFileTy;  T : APTy) 

(1: label) 

(htlookup  HT  1  (codety  G’  T))  -> 

(RegFileSubtype  G  G’)  -> 

(WFInstrSeq  HT  G  T  (jd  1)) 

I  s_st 

:  (HT:HeapTy;  G,G’ :RegFileTy;  T:APTy;  I : InstrSeq) 

(rd,rs:Reg;  i : int ; 

V,V’:(list  initf lag) ;  Ts:(list  Omega);  t: Omega) 

(reglookup  G  rd  (tupty  Ts  V))  -> 

(reglookup  G  rs  t)  -> 

(ListNth  ?  Ts  i  t)  -> 

(updatetupty  V  i  VJ)  -> 

(regupdate  G  rd  (tupty  Ts  V’)  G’)  -> 

(WFInstrSeq  HT  G’  T  I)  -> 

(WFInstrSeq  HT  G  T  (iseq  (st  rd  i  rs)  I)) 


Figure  1J, .  Coq  encoding  of  FTAL  static  semantics:  main  definitions  (1  of  2) 
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Inductive  WFHeapVal  (*  well-formed  heap  values:  HT  |-  h  :  t  hval  *) 

:  HeapTy  ->  HeapVal  ->  Omega  ->  Prop 
:=  tuple_wf 

:  (HT: HeapTy;  wl:(list  WordVal) ;  tl:(list  Omega);  il:(list  initflag)) 
(WFWordValinitList  HT  wl  tl  il)  -> 

(WFHeapVal  HT  (tuple  wl)  (tupty  tl  il)) 

I  code_wf 

:  (HT: HeapTy;  G:RegFileTy;  I:InstrSeq;  T:APTy) 

(WFInstrSeq  HT  G  T  I)  -> 

(WFHeapVal  HT  (code  G  T  I)  (codety  GT)). 

Inductive  WFHeap  (*  well-formed  heap  *) 

:  Heap  ->  HeapTy  ->  Prop 
:=  heap_wf 

:  (H:Heap;  HT: HeapTy) 

(EX  s  |  (hsize  H  s)  /\ 

(htsize  HT  s)  /\ 

((n: label;  h: HeapVal)  (hlookup  H  n  h)  ->  (It  ns))  /\ 

((n: label;  t: Omega)  (htlookup  HT  n  t)  ->  (It  ns))  /\ 

((n: label)  (It  n  s)  ->  (EX  h  I  (hlookup  H  n  h)))  /\ 

((n: label)  (It  n  s)  ->  (EX  t  I  (htlookup  HT  n  t)))  /\ 

((n: label;  h: HeapVal;  t: Omega) 

(hlookup  H  n  h) -> (htlookup  HT  n  t) -> (WFHeapVal  HT  h  t))  /\ 
(OrdHeap  H) 

)  -> 

(WFHeap  H  HT) . 

Inductive  WFRegFile  (*  well-formed  register  file  *) 

:  HeapTy  ->  RegFile  ->  RegFileTy  ->  Prop 
:=  regfile_wf 

:  (HT: HeapTy;  R: RegFile;  G: RegFileTy) 

((r:Reg;  t: Omega) 

(reglookup  G  r  t)  -> 

(EX  w  |  (reglookup  R  r  w)  /\  (WFWordVal  HT  w  t)))  -> 

((r:Reg;  v: WordVal;  1: label;  n:nat) 

(reglookup  R  r  v)  -> 

(stripWV  v)=(wl  1)  -> 

(htsize  HT  n)  -> 

(It  In))  -> 

(WFRegFile  HT  R  G) . 

Inductive  WFProgram  (*  well-formed  program  *) 

:  Program  ->  Prop 
:=  program_wf 

:  (H:Heap;  HT: HeapTy;  R: RegFile;  G: RegFileTy; 

1:AP;  t:APTy;  I : InstrSeq) 

(WFHeap  H  HT)  -> 

(WFRegFile  HT  R  G)  -> 

(WFap  HT  1  t)  -> 

(WFInstrSeq  HT  G  t  I)  -> 

(EX  1  |  (EX  G’  |  (EX  T’  |  (EX  T  I  (EX  n  I 
(hlookup  H  1  (code  G’  T’  I5))  /\ 

(ISubDepth  I  I ’  n))))))  -> 

(WFProgram  (H,  (R,  (1,  I)))). 


Figure  15.  Coq  encoding  of  FTAL  static  semantics:  main  definitions  (2  of  2) 
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6.5.  Generating  the  Initial  Condition 

In  order  to  generate  the  Initial  Condition,  we  use  a  compiler  that  takes 
an  FTAL  program  and  compiles  it  to  a  machine  state,  producing  the 
necessary  proofs  in  the  process.  The  structure  of  this  compiler  is  fairly 
straightforward:  After  parsing  an  FTAL  source  file,  type-checking  is 
performed.  The  algorithm  for  type-checking  follows  closely  the  struc¬ 
ture  of  the  inductively  defined  static  semantics  in  Coq.  (Similarly,  the 
compiler  structures  for  FTAL  abstract  syntax  mirror  the  Coq  encod¬ 
ing.)  Thus,  the  type-checker,  as  it  analyzes  the  FTAL  program,  simul¬ 
taneously  builds  a  Coq  term  representing  the  proof  of  well-formedness 
of  the  program.  In  particular,  if  P :  Program,  then  the  type-checking 
phase  produces  a  term,  D:  (WFProgram  P). 

Once  type-checking  is  successfully  completed,  the  compiler  then 
translates  the  FTAL  program  into  a  machine  state.  Again,  this  is 
done  in  such  a  manner  that  a  Coq  term  representing  the  machine  state 
and  the  proof  of  the  relation  between  the  FTAL  program  and  the 
machine  state  can  be  generated.  That  is,  for  some  S:  State,  a  term, 
T:  (TrProgram  P  S),  is  constructed.  Along  with  the  typing  derivation 
term  of  P  produced  above,  we  can  now  construct  a  proof  that  the  global 
invariant  holds  on  S.  This  can  then  be  composed  with  the  Safety 
theorem  of  the  previous  section  to  produce  a  complete  proof  of  the 
safety  of  the  machine  state  S,  as  specified  by  our  safety  policy. 

6.6.  The  complete  system 

We  now  have  a  complete  system  which  starts  with  a  typed  assembly 
language  program  and  compiles  it  into  a  FPCC  package,  consisting  of 
an  initial  machine  state  and  a  proof  of  safety.  Although  our  current  im¬ 
plementation  is  not  as  realistic  as  [7,  5],  the  advantages  of  the  syntactic 
FPCC  approach  are  still  clear.  We  compare  the  syntactic  and  semantic 
approaches  to  FPCC  in  detail  in  Section  7. 

With  respect  to  PCC  implementations  in  general,  the  two  most 
practical  considerations  are  the  extent  of  the  trusted  computing  base 
(TCB)  and  the  size  of  the  proofs  that  are  shipped  with  code.  As  for 
the  former,  the  TCB  of  our  syntactic  FPCC  implementation  would 
consist  of  the  following:  (1)  a  parser,  which  converts  the  state  of  the 
raw  machine  into  the  encoding  in  the  logic;  (2)  the  encoding  of  the 
machine  step  function  in  the  logic,  which  must  accurately  capture  the 
semantics  of  the  real  machine  (that  is,  it  must  be  adequate);  and  (3) 
the  proof-checker  of  the  logic.  The  first  two  will  necessarily  exist  in  any 
PCC  system.  For  syntactic  FPCC,  the  proof-checker  is  smaller  and 
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more  reliable  than  that  of  existing  PCC  systems  because  the  logic  used 
is  much  simpler.  In  addition,  the  VCgen  is  completely  eliminated  from 
the  system. 

Regarding  the  proofs  that  are  shipped  with  syntactic  FPCC  pack¬ 
ages,  note  that  a  large  portion  of  the  safety  proof  is  static — the  Progress 
and  Preservation  theorems  hold  regardless  of  the  particular  FTAL 
program  from  which  the  machine  state  was  compiled.  Hence,  this  part 
of  the  proof  does  not  need  to  be  re-supplied  (or  even  re-checked)  with 
every  individual  FPCC  package.  Furthermore,  the  remaining  portion 
of  the  proof  simply  consists  of  the  initial  FTAL  program  and  its  typing 
derivation.  The  typing  derivation  can  be  easily  and  quickly  generated  by 
either  the  code  producer  or  consumer.  Hence,  if  proof  size  is  especially 
critical,  the  only  additional  information  that  needs  to  be  supplied  with 
the  initial  machine  state  is  the  FTAL  program  itself. 


7.  Syntactic  vs.  Semantic  FPCC 

We  have  found  that  the  choice  between  the  syntactic  and  semantic 
approaches  to  generating  FPCC  involves  some  trade-offs,  which  we 
briefly  outline  in  this  section. 

In  previous  work  on  FPCC  [5,  3],  type  judgments  were  assigned  a 
meaning  (a  semantic  truth  value).  In  other  words,  each  type  of  the 
typed  assembly  language  is  viewed  as  a  predicate  to  be  applied  to 
memory,  a  value,  and  perhaps  more  arguments.  The  TAL  typing  rules 
then  become  lemmas  to  be  proved  in  this  semantic  model.  In  contrast, 
the  syntactic  approach  does  not  attempt  to  give  any  meaning  to  types 
or  typing  rules.  The  entire  typing  derivation  of  a  TAL  (or  FTAL) 
program  is  formalized  and  directly  encoded  in  the  logic.  The  FPCC 
safety  proof  is  generated  based  on  the  similarly  formalized  soundness 
proof  for  TAL.  Note,  however,  that  unlike  the  original  PCC  systems,  the 
typing  rules  are  not  part  of  the  trusted  base  of  our  system — they  must 
be  encoded  and  their  soundness  proved  using  only  on  the  foundations 
of  the  logic. 

The  difference  discussed  in  the  previous  paragraph  is  clearly  exhib¬ 
ited  in  the  nature  of  the  invariants  that  are  generated  for  the  two 
approaches.  In  the  semantic  approach,  the  global  invariant  used  to 
prove  safety,  although  it  may  be  derived  from  the  type  system  of  a 
typed  assembly  language,  actually  states  properties  directly  about  the 
machine  state — the  contents  of  registers,  memory  addresses,  and  so 
on.  This  approach  is  very  general  and  the  invariant  can  be  used  to 
express  arbitrary  properties  about  the  machine.  On  the  other  hand, 
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the  invariant  used  for  the  syntactic  approach  does  not  prove  properties 
directly  about  the  machine  state.  Instead,  it  simply  requires  that  there 
exist  a  correspondence  between  the  machine  state  and  an  assembly 
program.  The  safety  and  soundness  of  the  assembly  language  (which  is 
easier  to  prove)  is  used  to  ensure  safety  of  the  machine  code. 

The  most  obvious  feature  of  the  syntactic  approach  to  FPCC  is 
the  resulting  simplicity  of  the  overall  system.  The  complexities  evident 
in  [3,  6,  1,  2]  do  not  arise  in  our  system.  For  example,  in  order  to 
support  contravariant  recursive  types,  an  “indexed”  semantic  model 
is  necessary,  which  complicates  the  definition  of  types  and  requires 
tedious  reasoning  about  steps  of  computation.  A  more  serious  lim¬ 
itation  of  current  semantic  approaches  to  FPCC  is  the  difficulty  to 
model  mutable  record  fields.  This  is  a  consequence  of  circularity  in  the 
definition  of  a  “type”  as  a  predicate  on  a  state  that  is  a  pair  of  memory 
and  a  set  of  allocated  addresses  [3].  A  third  issue  which  has  yet  to  be 
addressed  by  the  semantic  model  of  types  is  supporting  a  type  system 
with  higher-order  kinds.  These,  and  various  other  difficulties  in  the 
semantic  approach,  result  from  attempting  to  give  a  meaning  to  types. 

The  reason  why  our  approach  does  not  suffer  from  the  same  com¬ 
plexity  is  that  it  only  needs  to  give  a  meaning  to  types  one  step  at  a 
time.  For  example,  in  a  semantic  approach,  when  trying  to  show  that 
two  mutually-recursive  functions  /  and  g  satisfy  the  predicates  for  their 
function  types,  we  have  the  problem  that  the  proof  for  /  needs  the  proof 
for  g  and  vice-versa.  Resolving  this  circularity  requires  a  coinduction 
principle  or  forces  the  use  of  an  “indexed”  semantic  model.  On  the 
other  hand,  a  syntactic  approach  will  simply  provide  a  typing  rule  for 
mutually-recursive  functions.  Of  course,  the  soundness  proof  still  needs 
to  show  that  the  typing  rule  is  meaningful,  but  it  only  needs  to  do  it 
one  step  at  a  time,  in  which  case  the  circularity  is  gone:  we  do  not  need 
to  assume  anything  about  g  in  order  to  show  that  the  first  instruction 
of  /  can  be  executed  safely.  Only  when  we  reach  the  call  to  g  need 
we  pay  attention  to  it,  but  at  that  point  we  do  not  need  to  assume 
anything  about  /  any  more.  Another  way  to  look  at  it  is  that  the 
“indexing”  is  done  implicitly,  for  free,  when  we  combine  the  progress 
and  preservation  lemmas  to  get  the  actual  safety  proof. 

Despite  the  overall  simplicity  of  the  approach  to  FPCC  given  in 
this  article,  it  is  not  without  potential  technical  intricacies.  One  of 
the  most  critical  of  these  is  the  encoding  of  the  syntactic  typing  rules 
and  the  soundness  proof.  In  our  prototype  Coq  “implementation”  we 
have  indeed  been  able  to  completely  formalize  and  encode  the  static 
and  operational  semantics  of  FTAL,  as  well  as  prove  the  progress  and 
preservation  theorems.  Although  the  encoding  is  not  entirely  trivial,  it 
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was  achieved  with  reasonable  effort.  (In  particular,  the  current  imple¬ 
mentation  of  the  proofs  of  FTAL  soundness  and  the  FPCC  Preserva¬ 
tion  and  Progress  theorems  was  completed  within  several  months  by  a 
single  graduate  student  with  no  previous  experience  in  Coq  or  CiC.) 
The  ability  in  CiC  to  perform  eliminations  on  inductive  definitions 
means  that  most  proofs  are  quite  straightforward  and  are  proven  using 
an  intuitive  sequence  of  steps.  The  fact  that  these  proofs  are  generated 
interactively  ( i.e .  manually)  is  not  an  issue  because  it  only  needs  to  be 
done  once. 

Finally,  our  approach  relies  on  the  availability  of  a  typed  assembly 
language  that  is  similar  to  the  machine  for  which  proofs  will  be  gener¬ 
ated.  It  is  also  necessary  that  the  type  system  capture  all  the  invariants 
needed  to  prove  soundness  of  the  machine  code.  In  this  article,  we  took 
the  interesting  step  of  splitting  the  conventional  malloc  instruction  of 
TAL  into  two  separate  instructions  (alloc  and  bump),  each  of  which  is 
directly  translated  into  a  single  machine  instruction.  We  thus  needed 
to  refine  the  type  system  so  that  the  information  about  the  allocation 
state  is  correctly  maintained  in  the  invariant  during  translation.  In 
general,  whatever  criteria  is  specified  by  the  safety  policy  (i.e.,  in  the 
definition  of  SP  ( S ))  will  need  to  be  reflected  in  the  type  system. 


8.  Related  Work 

The  original  PCC  system  was  designed  by  Necula  and  Lee  [17,  15,  16], 
as  discussed  in  our  introduction.  In  addition  to  the  general  framework 
laid  out  in  their  work,  implementation  effort  on  building  a  certifying 
compiler  has  also  been  carried  out  [18,  7].  As  also  mentioned  pre¬ 
viously,  however,  these  existing  certifying  compilers  and  clients  are 
very  language-specific  and  incorporate  “built-in”  understanding  of  a 
particular  type-system  into  the  logic. 

Our  source  language,  FTAL,  is  derived  from  the  typed  assembly 
language  framework  designed  by  Morrisett  et  al.  [14].  Although,  in 
contrast  with  PCC,  typed  assembly  language  does  not  deal  with  code 
at  the  lowest  level  of  the  machine,  it  is  a  critical  tool  which  makes  auto¬ 
matic  generation  of  PCC  proofs  possible — following  either  the  syntactic 
or  the  semantic  approach. 

Appel  and  Felty  were  the  first  to  propose  the  notion  of  foundational 
PCC  [5,  3].  Work  on  the  semantic  approach  to  FPCC  has  been  carried 
out  by  Appel,  Felty,  and  others  [5,  6,  1,  12,  4], 

In  a  recent  paper,  Shao  et  al.  [20]  showed  how  to  incorporate  a  logic 
such  as  CiC  into  a  typed  intermediate  language.  Together  with  the  work 
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described  in  this  article,  we  can  now  build  an  end-to-end  compiler  that 
compiles  high-level  richly  typed  programs  into  FPCC. 

Lastly,  the  syntactic  approach  to  proving  type  soundness,  an  idea 
which  we  take  advantage  of  in  this  article,  was  introduced  by  Wright 
and  Felleisen  [25]. 


9.  Conclusion 

This  article  presents  an  approach  for  producing  foundational  proof¬ 
carrying  code  based  on  syntactic  soundness  proofs.  Starting  with  a 
type  system  for  a  typed  assembly  language,  we  formally  encode  its 
soundness  proof  and  show  a  precise  correspondence  between  TAL  and 
the  language  of  the  actual  machine.  We  use  this  (syntactic)  correspon¬ 
dence,  along  with  the  proof  that  the  type  system  enforces  the  invariants 
or  constraints  of  the  safety  policy,  to  generate  a  package  consisting  of 
machine  code  and  its  proof  of  safety.  By  avoiding  semantic  modeling 
of  types  as  in  previous  approaches,  our  framework  for  constructing 
foundational  proofs  is  much  simpler  and  more  straightforward. 
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